Overview

Dataset statistics

Number of variables 41
Number of observations 4800
Missing cells 122464
Missing cells (%) 62.2%
Duplicate rows 0
Duplicate rows (%) 0.0%
Total size in memory 1.5 MiB
Average record size in memory 328.0 B

Variable types

Categorical 34
Numeric 6
Unsupported 1

Dataset

Description Dataset about students who were invited to pose and answer questions to each other via a chatbot application, supported by Telegram
Creator Matteo Busso, Massimo Stefan
Author Fausto Giunchiglia, Ivano Bison, Matteo Busso, Ronald Chenu-Abente, Marcelo Rodas Britez, Can Gunel, Giuseppe Veltri, Amalia de Götzen, Peter Kun, Amarsanaa Ganbold, Altangerel Chagnaa, George Gaskell, Miriam Bidoglia, Luca Cernuzzi, Alethia Hume, Jose Luis Zarza, Daniele Miorandi, Carlo Caprini
URL
Copyright (c) KnowDive 2022

Variable descriptions

university University where the experiment took place
id The task’s id
taskTypeId The type of the task
lastUpdateTs Timestamp of last update of the task
creationTs Creation timestamp of the task
requesterId The Id of the user making the task (asking a question)
appId The chatbot deployment
closeTs Closing timestamp of the task (if the task had an accepted answer)
communityId The Id of the users in the community
transactions.taskId The task’s id
transactions.label Label of action on task (answerTransaction, bestAnswerTransaction, CREATE_TASK, moreAnswerTransaction, notAnswerTransaction, reportAnswerTransaction, reportQuestionTransaction)
transactions.creationTs Creation timestamp of transaction
transactions.actioneerId User id of transaction action
transactions.lastUpdateTs Last update timestamp of transaction
transactions.count.id Count of follow-up action on task (Higher the number, the more actions were done on task)
transactions.messages.appId App Id of the transaction message
transactions.messages.receiverId User Id of the transaction message
transactions.messages.label Label of transaction message (AnsweredPickedMessage, AnsweredQuestionMessage, QuestionToAnswerMessage)
transactions.messages.attributes.taskId taskId of the transaction message
transactions.messages.attributes.question Question text of the transaction
transactions.messages.attributes.userId User Id of the person asking the question
transactions.messages.attributes.anonymous Is the question anonymous?
transactions.messages.attributes.sensitive Is the question sensitive?
transactions.messages.attributes.positionOfAnswerer Physical proximity of questioner
transactions.messages.attributes.transactionId Id of the transaction
transactions.messages.attributes.answer Answer on a question
transactions.attributes.answer Answer on a question
transactions.attributes.anonymous Anonymous answer
transactions.attributes.reason Reason of accepting an answer
transactions.attributes.transactionId Id of transaction
transactions.attributes.helpful How helpful was the accepted answer on transaction
goal.name Question (without duplicating extended questions)
goal.description Empty column
attributes.domain Question’s domain
attributes.domainInterest Similar-different domain
attributes.beliefsAndValues Similar-different beliefs and values
attributes.sensitive Is it a sensitive question
attributes.anonymous Is it an anonymous question
attributes.socialCloseness Close-far social closeness
attributes.positionOfAnswerer Physical proximity of answerer
attributes.maxUsers Number of users the question is forwarded to

Alerts

taskTypeId has constant value "618d504ed844da03b28cc4bf" Constant
attributes.maxUsers has constant value "15.0" Constant
id has a high cardinality: 725 distinct values High cardinality
lastUpdateTs has a high cardinality: 725 distinct values High cardinality
creationTs has a high cardinality: 712 distinct values High cardinality
closeTs has a high cardinality: 269 distinct values High cardinality
transactions.taskId has a high cardinality: 725 distinct values High cardinality
transactions.creationTs has a high cardinality: 4796 distinct values High cardinality
transactions.lastUpdateTs has a high cardinality: 4784 distinct values High cardinality
transactions.messages.receiverId has a high cardinality: 725 distinct values High cardinality
transactions.messages.attributes.taskId has a high cardinality: 725 distinct values High cardinality
transactions.messages.attributes.question has a high cardinality: 688 distinct values High cardinality
transactions.messages.attributes.answer has a high cardinality: 2695 distinct values High cardinality
transactions.attributes.answer has a high cardinality: 2702 distinct values High cardinality
transactions.attributes.reason has a high cardinality: 257 distinct values High cardinality
goal.name has a high cardinality: 688 distinct values High cardinality
requesterId is highly correlated with transactions.actioneerId and 1 other fields High correlation
transactions.actioneerId is highly correlated with requesterId and 1 other fields High correlation
transactions.count.id is highly correlated with transactions.messages.attributes.transactionId and 1 other fields High correlation
transactions.messages.attributes.userId is highly correlated with requesterId and 1 other fields High correlation
transactions.messages.attributes.anonymous is highly correlated with transactions.messages.attributes.sensitive and 3 other fields High correlation
transactions.messages.attributes.sensitive is highly correlated with transactions.messages.attributes.anonymous and 2 other fields High correlation
transactions.messages.attributes.transactionId is highly correlated with transactions.count.id and 1 other fields High correlation
transactions.attributes.anonymous is highly correlated with transactions.messages.attributes.anonymous High correlation
transactions.attributes.transactionId is highly correlated with transactions.count.id and 1 other fields High correlation
attributes.sensitive is highly correlated with transactions.messages.attributes.anonymous and 2 other fields High correlation
attributes.anonymous is highly correlated with transactions.messages.attributes.anonymous and 2 other fields High correlation
requesterId is highly correlated with transactions.actioneerId and 1 other fields High correlation
transactions.actioneerId is highly correlated with requesterId and 1 other fields High correlation
transactions.count.id is highly correlated with transactions.messages.attributes.transactionId and 1 other fields High correlation
transactions.messages.attributes.userId is highly correlated with requesterId and 1 other fields High correlation
transactions.messages.attributes.anonymous is highly correlated with transactions.messages.attributes.sensitive and 3 other fields High correlation
transactions.messages.attributes.sensitive is highly correlated with transactions.messages.attributes.anonymous and 2 other fields High correlation
transactions.messages.attributes.transactionId is highly correlated with transactions.count.id and 1 other fields High correlation
transactions.attributes.anonymous is highly correlated with transactions.messages.attributes.anonymous High correlation
transactions.attributes.transactionId is highly correlated with transactions.count.id and 1 other fields High correlation
attributes.sensitive is highly correlated with transactions.messages.attributes.anonymous and 2 other fields High correlation
attributes.anonymous is highly correlated with transactions.messages.attributes.anonymous and 2 other fields High correlation
requesterId is highly correlated with transactions.actioneerId and 1 other fields High correlation
transactions.actioneerId is highly correlated with requesterId and 1 other fields High correlation
transactions.count.id is highly correlated with transactions.messages.attributes.transactionId and 1 other fields High correlation
transactions.messages.attributes.userId is highly correlated with requesterId and 1 other fields High correlation
transactions.messages.attributes.anonymous is highly correlated with transactions.messages.attributes.sensitive and 3 other fields High correlation
transactions.messages.attributes.sensitive is highly correlated with transactions.messages.attributes.anonymous and 2 other fields High correlation
transactions.messages.attributes.transactionId is highly correlated with transactions.count.id and 1 other fields High correlation
transactions.attributes.anonymous is highly correlated with transactions.messages.attributes.anonymous High correlation
transactions.attributes.transactionId is highly correlated with transactions.count.id and 1 other fields High correlation
attributes.sensitive is highly correlated with transactions.messages.attributes.anonymous and 2 other fields High correlation
attributes.anonymous is highly correlated with transactions.messages.attributes.anonymous and 2 other fields High correlation
university is highly correlated with requesterId and 5 other fields High correlation
requesterId is highly correlated with university and 5 other fields High correlation
appId is highly correlated with university and 5 other fields High correlation
communityId is highly correlated with university and 5 other fields High correlation
transactions.label is highly correlated with transactions.messages.label High correlation
transactions.actioneerId is highly correlated with university and 5 other fields High correlation
transactions.count.id is highly correlated with transactions.messages.attributes.transactionId and 1 other fields High correlation
transactions.messages.appId is highly correlated with university and 5 other fields High correlation
transactions.messages.label is highly correlated with transactions.label High correlation
transactions.messages.attributes.userId is highly correlated with university and 5 other fields High correlation
transactions.messages.attributes.anonymous is highly correlated with transactions.messages.attributes.sensitive and 3 other fields High correlation
transactions.messages.attributes.sensitive is highly correlated with transactions.messages.attributes.anonymous and 2 other fields High correlation
transactions.messages.attributes.positionOfAnswerer is highly correlated with attributes.positionOfAnswerer High correlation
transactions.messages.attributes.transactionId is highly correlated with transactions.count.id and 1 other fields High correlation
transactions.attributes.anonymous is highly correlated with transactions.messages.attributes.anonymous High correlation
transactions.attributes.transactionId is highly correlated with transactions.count.id and 1 other fields High correlation
attributes.domainInterest is highly correlated with attributes.beliefsAndValues and 1 other fields High correlation
attributes.beliefsAndValues is highly correlated with attributes.domainInterest and 1 other fields High correlation
attributes.sensitive is highly correlated with transactions.messages.attributes.anonymous and 2 other fields High correlation
attributes.anonymous is highly correlated with transactions.messages.attributes.anonymous and 2 other fields High correlation
attributes.socialCloseness is highly correlated with attributes.domainInterest and 1 other fields High correlation
attributes.positionOfAnswerer is highly correlated with transactions.messages.attributes.positionOfAnswerer High correlation
id has 4075 (84.9%) missing values Missing
taskTypeId has 4075 (84.9%) missing values Missing
lastUpdateTs has 4075 (84.9%) missing values Missing
creationTs has 4075 (84.9%) missing values Missing
requesterId has 4075 (84.9%) missing values Missing
appId has 4075 (84.9%) missing values Missing
closeTs has 4531 (94.4%) missing values Missing
communityId has 4075 (84.9%) missing values Missing
transactions.messages.appId has 3600 (75.0%) missing values Missing
transactions.messages.receiverId has 4075 (84.9%) missing values Missing
transactions.messages.label has 2419 (50.4%) missing values Missing
transactions.messages.attributes.taskId has 3600 (75.0%) missing values Missing
transactions.messages.attributes.question has 777 (16.2%) missing values Missing
transactions.messages.attributes.userId has 1066 (22.2%) missing values Missing
transactions.messages.attributes.anonymous has 3398 (70.8%) missing values Missing
transactions.messages.attributes.sensitive has 3852 (80.2%) missing values Missing
transactions.messages.attributes.positionOfAnswerer has 3852 (80.2%) missing values Missing
transactions.messages.attributes.transactionId has 1730 (36.0%) missing values Missing
transactions.messages.attributes.answer has 2003 (41.7%) missing values Missing
transactions.attributes.answer has 1996 (41.6%) missing values Missing
transactions.attributes.anonymous has 1996 (41.6%) missing values Missing
transactions.attributes.reason has 4518 (94.1%) missing values Missing
transactions.attributes.transactionId has 4524 (94.2%) missing values Missing
transactions.attributes.helpful has 4527 (94.3%) missing values Missing
goal.name has 4075 (84.9%) missing values Missing
goal.description has 4800 (100.0%) missing values Missing
attributes.domain has 4075 (84.9%) missing values Missing
attributes.domainInterest has 4075 (84.9%) missing values Missing
attributes.beliefsAndValues has 4075 (84.9%) missing values Missing
attributes.sensitive has 4075 (84.9%) missing values Missing
attributes.anonymous has 4075 (84.9%) missing values Missing
attributes.socialCloseness has 4075 (84.9%) missing values Missing
attributes.positionOfAnswerer has 4075 (84.9%) missing values Missing
attributes.maxUsers has 4075 (84.9%) missing values Missing
id is uniformly distributed Uniform
lastUpdateTs is uniformly distributed Uniform
creationTs is uniformly distributed Uniform
closeTs is uniformly distributed Uniform
transactions.creationTs is uniformly distributed Uniform
transactions.lastUpdateTs is uniformly distributed Uniform
transactions.messages.receiverId is uniformly distributed Uniform
transactions.messages.attributes.taskId is uniformly distributed Uniform
transactions.messages.attributes.answer is uniformly distributed Uniform
transactions.attributes.answer is uniformly distributed Uniform
transactions.attributes.reason is uniformly distributed Uniform
goal.name is uniformly distributed Uniform
goal.description is an unsupported type, check if it needs cleaning or further analysis Unsupported
transactions.count.id has 725 (15.1%) zeros Zeros

Reproduction

Analysis started 2022-07-04 18:21:11.032678
Analysis finished 2022-07-04 18:21:35.592306
Duration 24.56 seconds
Software version pandas-profiling v3.2.0
Download configuration config.json

Variables

university
Categorical

HIGH CORRELATION

University where the experiment took place

Distinct 5
Distinct (%) 0.1%
Missing 0
Missing (%) 0.0%
Memory size 37.6 KiB
AAU
1493
NUM
1455
UC
1028
UNITN
561
LSE
263

Length

Max length 5
Median length 3
Mean length 3.019583333
Min length 2

Characters and Unicode

Total characters 14494
Distinct characters 10
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row UC
2nd row UC
3rd row UC
4th row UC
5th row UC

Common Values

Value Count Frequency (%)
AAU 1493
31.1%
NUM 1455
30.3%
UC 1028
21.4%
UNITN 561
11.7%
LSE 263
5.5%

Length

2022-07-04T20:21:35.730816 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-04T20:21:35.979664 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Value Count Frequency (%)
aau 1493
31.1%
num 1455
30.3%
uc 1028
21.4%
unitn 561
11.7%
lse 263
5.5%

Most occurring characters

Value Count Frequency (%)
U 4537
31.3%
A 2986
20.6%
N 2577
17.8%
M 1455
10.0%
C 1028
7.1%
I 561
3.9%
T 561
3.9%
L 263
1.8%
S 263
1.8%
E 263
1.8%

Most occurring categories

Value Count Frequency (%)
Uppercase Letter 14494
100.0%

Most frequent character per category

Uppercase Letter
Value Count Frequency (%)
U 4537
31.3%
A 2986
20.6%
N 2577
17.8%
M 1455
10.0%
C 1028
7.1%
I 561
3.9%
T 561
3.9%
L 263
1.8%
S 263
1.8%
E 263
1.8%

Most occurring scripts

Value Count Frequency (%)
Latin 14494
100.0%

Most frequent character per script

Latin
Value Count Frequency (%)
U 4537
31.3%
A 2986
20.6%
N 2577
17.8%
M 1455
10.0%
C 1028
7.1%
I 561
3.9%
T 561
3.9%
L 263
1.8%
S 263
1.8%
E 263
1.8%

Most occurring blocks

Value Count Frequency (%)
ASCII 14494
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
U 4537
31.3%
A 2986
20.6%
N 2577
17.8%
M 1455
10.0%
C 1028
7.1%
I 561
3.9%
T 561
3.9%
L 263
1.8%
S 263
1.8%
E 263
1.8%

id
Categorical

HIGH CARDINALITY
MISSING
UNIFORM

The task’s id

Distinct 725
Distinct (%) 100.0%
Missing 4075
Missing (%) 84.9%
Memory size 37.6 KiB
619c473359410d1c776ddf3f
1
619c4ed159410d1c776ddf41
1
619c5adb59410d1c776ddf42
1
619c631859410d1c776ddf43
1
619c72df59410d1c776ddf44
1
Other values (720)
720

Length

Max length 24
Median length 24
Mean length 24
Min length 24

Characters and Unicode

Total characters 17400
Distinct characters 16
Distinct categories 2 ?
Distinct scripts 2 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 725 ?
Unique (%) 100.0%

Sample

1st row 619b9aa959410d1c776ddf15
2nd row 619ba35b59410d1c776ddf19
3rd row 619ba62f59410d1c776ddf1a
4th row 619ba6f659410d1c776ddf1b
5th row 619bc11a59410d1c776ddf25

Common Values

Value Count Frequency (%)
619c473359410d1c776ddf3f 1
< 0.1%
619c4ed159410d1c776ddf41 1
< 0.1%
619c5adb59410d1c776ddf42 1
< 0.1%
619c631859410d1c776ddf43 1
< 0.1%
619c72df59410d1c776ddf44 1
< 0.1%
619c740059410d1c776ddf45 1
< 0.1%
619c8ea359410d1c776ddf46 1
< 0.1%
619c8f5b59410d1c776ddf47 1
< 0.1%
619c92e659410d1c776ddf48 1
< 0.1%
619c937559410d1c776ddf49 1
< 0.1%
Other values (715) 715
14.9%
(Missing) 4075
84.9%

Length

2022-07-04T20:21:36.210530 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
Value Count Frequency (%)
619c473359410d1c776ddf3f 1
0.1%
619e77de59410d1c776de000 1
0.1%
619ba35b59410d1c776ddf19 1
0.1%
619ba62f59410d1c776ddf1a 1
0.1%
619ba6f659410d1c776ddf1b 1
0.1%
619bc11a59410d1c776ddf25 1
0.1%
619bc7c259410d1c776ddf28 1
0.1%
619bdfba59410d1c776ddf2e 1
0.1%
619c0aaf59410d1c776ddf34 1
0.1%
619cea9459410d1c776ddf6b 1
0.1%
Other values (715) 715
98.6%

Most occurring characters

Value Count Frequency (%)
1 2112
12.1%
d 1749
10.1%
6 1610
9.3%
5 1491
8.6%
0 1465
8.4%
9 1297
7.5%
7 1068
6.1%
8 1013
5.8%
f 935
5.4%
b 907
5.2%
Other values (6) 3753
21.6%

Most occurring categories

Value Count Frequency (%)
Decimal Number 11801
67.8%
Lowercase Letter 5599
32.2%

Most frequent character per category

Decimal Number
Value Count Frequency (%)
1 2112
17.9%
6 1610
13.6%
5 1491
12.6%
0 1465
12.4%
9 1297
11.0%
7 1068
9.1%
8 1013
8.6%
4 666
5.6%
3 651
5.5%
2 428
3.6%
Lowercase Letter
Value Count Frequency (%)
d 1749
31.2%
f 935
16.7%
b 907
16.2%
c 902
16.1%
a 632
11.3%
e 474
8.5%

Most occurring scripts

Value Count Frequency (%)
Common 11801
67.8%
Latin 5599
32.2%

Most frequent character per script

Common
Value Count Frequency (%)
1 2112
17.9%
6 1610
13.6%
5 1491
12.6%
0 1465
12.4%
9 1297
11.0%
7 1068
9.1%
8 1013
8.6%
4 666
5.6%
3 651
5.5%
2 428
3.6%
Latin
Value Count Frequency (%)
d 1749
31.2%
f 935
16.7%
b 907
16.2%
c 902
16.1%
a 632
11.3%
e 474
8.5%

Most occurring blocks

Value Count Frequency (%)
ASCII 17400
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
1 2112
12.1%
d 1749
10.1%
6 1610
9.3%
5 1491
8.6%
0 1465
8.4%
9 1297
7.5%
7 1068
6.1%
8 1013
5.8%
f 935
5.4%
b 907
5.2%
Other values (6) 3753
21.6%

taskTypeId
Categorical

CONSTANT
MISSING
REJECTED

The type of the task

Distinct 1
Distinct (%) 0.1%
Missing 4075
Missing (%) 84.9%
Memory size 37.6 KiB
618d504ed844da03b28cc4bf
725

Length

Max length 24
Median length 24
Mean length 24
Min length 24

Characters and Unicode

Total characters 17400
Distinct characters 14
Distinct categories 2 ?
Distinct scripts 2 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row 618d504ed844da03b28cc4bf
2nd row 618d504ed844da03b28cc4bf
3rd row 618d504ed844da03b28cc4bf
4th row 618d504ed844da03b28cc4bf
5th row 618d504ed844da03b28cc4bf

Common Values

Value Count Frequency (%)
618d504ed844da03b28cc4bf 725
15.1%
(Missing) 4075
84.9%

Length

2022-07-04T20:21:36.437464 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-04T20:21:36.658289 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Value Count Frequency (%)
618d504ed844da03b28cc4bf 725
100.0%

Most occurring characters

Value Count Frequency (%)
4 2900
16.7%
8 2175
12.5%
d 2175
12.5%
0 1450
8.3%
b 1450
8.3%
c 1450
8.3%
6 725
4.2%
1 725
4.2%
5 725
4.2%
e 725
4.2%
Other values (4) 2900
16.7%

Most occurring categories

Value Count Frequency (%)
Decimal Number 10150
58.3%
Lowercase Letter 7250
41.7%

Most frequent character per category

Decimal Number
Value Count Frequency (%)
4 2900
28.6%
8 2175
21.4%
0 1450
14.3%
6 725
7.1%
1 725
7.1%
5 725
7.1%
3 725
7.1%
2 725
7.1%
Lowercase Letter
Value Count Frequency (%)
d 2175
30.0%
b 1450
20.0%
c 1450
20.0%
e 725
10.0%
a 725
10.0%
f 725
10.0%

Most occurring scripts

Value Count Frequency (%)
Common 10150
58.3%
Latin 7250
41.7%

Most frequent character per script

Common
Value Count Frequency (%)
4 2900
28.6%
8 2175
21.4%
0 1450
14.3%
6 725
7.1%
1 725
7.1%
5 725
7.1%
3 725
7.1%
2 725
7.1%
Latin
Value Count Frequency (%)
d 2175
30.0%
b 1450
20.0%
c 1450
20.0%
e 725
10.0%
a 725
10.0%
f 725
10.0%

Most occurring blocks

Value Count Frequency (%)
ASCII 17400
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
4 2900
16.7%
8 2175
12.5%
d 2175
12.5%
0 1450
8.3%
b 1450
8.3%
c 1450
8.3%
6 725
4.2%
1 725
4.2%
5 725
4.2%
e 725
4.2%
Other values (4) 2900
16.7%

lastUpdateTs
Categorical

HIGH CARDINALITY
MISSING
UNIFORM

Timestamp of last update of the task

Distinct 725
Distinct (%) 100.0%
Missing 4075
Missing (%) 84.9%
Memory size 37.6 KiB
2021-12-06 10:01:13
1
2021-11-25 03:57:34
1
2021-11-25 04:06:35
1
2021-11-25 04:00:54
1
2021-11-23 12:54:03
1
Other values (720)
720

Length

Max length 19
Median length 19
Mean length 19
Min length 19

Characters and Unicode

Total characters 13775
Distinct characters 13
Distinct categories 4 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 725 ?
Unique (%) 100.0%

Sample

1st row 2021-11-22 15:04:38
2nd row 2021-11-23 09:03:34
3rd row 2021-11-22 11:19:30
4th row 2021-11-22 11:22:25
5th row 2021-11-22 13:34:15

Common Values

Value Count Frequency (%)
2021-12-06 10:01:13 1
< 0.1%
2021-11-25 03:57:34 1
< 0.1%
2021-11-25 04:06:35 1
< 0.1%
2021-11-25 04:00:54 1
< 0.1%
2021-11-23 12:54:03 1
< 0.1%
2021-12-06 10:10:36 1
< 0.1%
2021-12-06 10:11:19 1
< 0.1%
2021-11-25 03:42:29 1
< 0.1%
2021-11-28 11:31:09 1
< 0.1%
2021-11-23 15:12:56 1
< 0.1%
Other values (715) 715
14.9%
(Missing) 4075
84.9%

Length

2022-07-04T20:21:36.844367 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
Value Count Frequency (%)
2021-11-25 91
6.3%
2021-12-06 70
4.8%
2021-11-23 68
4.7%
2021-12-03 61
4.2%
2021-11-29 54
3.7%
2021-12-02 50
3.4%
2021-11-24 48
3.3%
2021-11-26 42
2.9%
2021-11-22 35
2.4%
2021-11-30 33
2.3%
Other values (740) 898
61.9%

Most occurring characters

Value Count Frequency (%)
2 2860
20.8%
1 2795
20.3%
0 1693
12.3%
- 1450
10.5%
: 1450
10.5%
725
5.3%
3 670
4.9%
5 587
4.3%
4 507
3.7%
9 281
2.0%
Other values (3) 757
5.5%

Most occurring categories

Value Count Frequency (%)
Decimal Number 10150
73.7%
Dash Punctuation 1450
10.5%
Other Punctuation 1450
10.5%
Space Separator 725
5.3%

Most frequent character per category

Decimal Number
Value Count Frequency (%)
2 2860
28.2%
1 2795
27.5%
0 1693
16.7%
3 670
6.6%
5 587
5.8%
4 507
5.0%
9 281
2.8%
6 267
2.6%
7 251
2.5%
8 239
2.4%
Dash Punctuation
Value Count Frequency (%)
- 1450
100.0%
Other Punctuation
Value Count Frequency (%)
: 1450
100.0%
Space Separator
Value Count Frequency (%)
725
100.0%

Most occurring scripts

Value Count Frequency (%)
Common 13775
100.0%

Most frequent character per script

Common
Value Count Frequency (%)
2 2860
20.8%
1 2795
20.3%
0 1693
12.3%
- 1450
10.5%
: 1450
10.5%
725
5.3%
3 670
4.9%
5 587
4.3%
4 507
3.7%
9 281
2.0%
Other values (3) 757
5.5%

Most occurring blocks

Value Count Frequency (%)
ASCII 13775
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
2 2860
20.8%
1 2795
20.3%
0 1693
12.3%
- 1450
10.5%
: 1450
10.5%
725
5.3%
3 670
4.9%
5 587
4.3%
4 507
3.7%
9 281
2.0%
Other values (3) 757
5.5%

creationTs
Categorical

HIGH CARDINALITY
MISSING
UNIFORM

Creation timestamp of the task

Distinct 712
Distinct (%) 98.2%
Missing 4075
Missing (%) 84.9%
Memory size 37.6 KiB
2021-12-02 17:56:14
4
2021-11-23 16:30:34
3
2021-11-30 21:33:02
2
2021-11-27 20:16:09
2
2021-11-29 10:15:34
2
Other values (707)
712

Length

Max length 19
Median length 19
Mean length 19
Min length 19

Characters and Unicode

Total characters 13775
Distinct characters 13
Distinct categories 4 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 702 ?
Unique (%) 96.8%

Sample

1st row 2021-11-22 10:27:06
2nd row 2021-11-22 11:04:11
3rd row 2021-11-22 11:16:15
4th row 2021-11-22 11:19:34
5th row 2021-11-22 13:11:06

Common Values

Value Count Frequency (%)
2021-12-02 17:56:14 4
0.1%
2021-11-23 16:30:34 3
0.1%
2021-11-30 21:33:02 2
< 0.1%
2021-11-27 20:16:09 2
< 0.1%
2021-11-29 10:15:34 2
< 0.1%
2021-11-29 16:30:37 2
< 0.1%
2021-11-30 08:13:56 2
< 0.1%
2021-12-02 13:26:40 2
< 0.1%
2021-11-24 13:00:39 2
< 0.1%
2021-11-24 18:48:25 2
< 0.1%
Other values (702) 702
14.6%
(Missing) 4075
84.9%

Length

2022-07-04T20:21:37.055858 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
Value Count Frequency (%)
2021-11-23 147
10.1%
2021-11-22 80
5.5%
2021-11-24 74
5.1%
2021-12-02 61
4.2%
2021-11-25 61
4.2%
2021-11-26 35
2.4%
2021-11-29 34
2.3%
2021-11-27 31
2.1%
2021-12-03 28
1.9%
2021-12-06 27
1.9%
Other values (726) 872
60.1%

Most occurring characters

Value Count Frequency (%)
2 2950
21.4%
1 2946
21.4%
0 1521
11.0%
- 1450
10.5%
: 1450
10.5%
725
5.3%
3 721
5.2%
4 569
4.1%
5 502
3.6%
6 250
1.8%
Other values (3) 691
5.0%

Most occurring categories

Value Count Frequency (%)
Decimal Number 10150
73.7%
Dash Punctuation 1450
10.5%
Other Punctuation 1450
10.5%
Space Separator 725
5.3%

Most frequent character per category

Decimal Number
Value Count Frequency (%)
2 2950
29.1%
1 2946
29.0%
0 1521
15.0%
3 721
7.1%
4 569
5.6%
5 502
4.9%
6 250
2.5%
7 241
2.4%
8 231
2.3%
9 219
2.2%
Dash Punctuation
Value Count Frequency (%)
- 1450
100.0%
Other Punctuation
Value Count Frequency (%)
: 1450
100.0%
Space Separator
Value Count Frequency (%)
725
100.0%

Most occurring scripts

Value Count Frequency (%)
Common 13775
100.0%

Most frequent character per script

Common
Value Count Frequency (%)
2 2950
21.4%
1 2946
21.4%
0 1521
11.0%
- 1450
10.5%
: 1450
10.5%
725
5.3%
3 721
5.2%
4 569
4.1%
5 502
3.6%
6 250
1.8%
Other values (3) 691
5.0%

Most occurring blocks

Value Count Frequency (%)
ASCII 13775
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
2 2950
21.4%
1 2946
21.4%
0 1521
11.0%
- 1450
10.5%
: 1450
10.5%
725
5.3%
3 721
5.2%
4 569
4.1%
5 502
3.6%
6 250
1.8%
Other values (3) 691
5.0%

requesterId
Real number (ℝ ≥0 )

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING

The Id of the user making the task (asking a question)

Distinct 126
Distinct (%) 17.4%
Missing 4075
Missing (%) 84.9%
Infinite 0
Infinite (%) 0.0%
Mean 345.9737931
Minimum 5
Maximum 510
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 37.6 KiB
2022-07-04T20:21:37.299226 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum 5
5-th percentile 94
Q1 341
median 394
Q3 428
95-th percentile 479.8
Maximum 510
Range 505
Interquartile range (IQR) 87

Descriptive statistics

Standard deviation 121.8295414
Coefficient of variation (CV) 0.3521351727
Kurtosis 0.1032130144
Mean 345.9737931
Median Absolute Deviation (MAD) 39
Skewness -1.159377193
Sum 250831
Variance 14842.43716
Monotonicity Not monotonic
2022-07-04T20:21:37.596422 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
433 51
1.1%
347 37
0.8%
394 29
0.6%
378 27
0.6%
147 21
0.4%
387 20
0.4%
107 19
0.4%
404 19
0.4%
355 18
0.4%
426 18
0.4%
Other values (116) 466
9.7%
(Missing) 4075
84.9%
Value Count Frequency (%)
5 1
< 0.1%
10 2
< 0.1%
22 2
< 0.1%
40 3
0.1%
49 8
0.2%
51 3
0.1%
81 7
0.1%
85 5
0.1%
94 15
0.3%
107 19
0.4%
Value Count Frequency (%)
510 2
< 0.1%
509 1
< 0.1%
508 1
< 0.1%
507 1
< 0.1%
505 1
< 0.1%
504 4
0.1%
503 1
< 0.1%
501 1
< 0.1%
500 1
< 0.1%
499 2
< 0.1%

appId
Categorical

HIGH CORRELATION
MISSING

The chatbot deployment

Distinct 5
Distinct (%) 0.7%
Missing 4075
Missing (%) 84.9%
Memory size 37.6 KiB
sVq48ryxGe
220
2O4ppqNC6f
215
HnH5iaO6VI
180
GuS9StLxrv
63
LUOmwNXfZq
47

Length

Max length 10
Median length 10
Mean length 10
Min length 10

Characters and Unicode

Total characters 7250
Distinct characters 34
Distinct categories 3 ?
Distinct scripts 2 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row HnH5iaO6VI
2nd row HnH5iaO6VI
3rd row HnH5iaO6VI
4th row HnH5iaO6VI
5th row HnH5iaO6VI

Common Values

Value Count Frequency (%)
sVq48ryxGe 220
4.6%
2O4ppqNC6f 215
4.5%
HnH5iaO6VI 180
3.8%
GuS9StLxrv 63
1.3%
LUOmwNXfZq 47
1.0%
(Missing) 4075
84.9%

Length

2022-07-04T20:21:37.855736 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-04T20:21:38.108894 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Value Count Frequency (%)
svq48ryxge 220
30.3%
2o4ppqnc6f 215
29.7%
hnh5iao6vi 180
24.8%
gus9stlxrv 63
8.7%
luomwnxfzq 47
6.5%

Most occurring characters

Value Count Frequency (%)
q 482
6.6%
O 442
6.1%
4 435
6.0%
p 430
5.9%
V 400
5.5%
6 395
5.4%
H 360
5.0%
r 283
3.9%
x 283
3.9%
G 283
3.9%
Other values (24) 3457
47.7%

Most occurring categories

Value Count Frequency (%)
Lowercase Letter 3223
44.5%
Uppercase Letter 2519
34.7%
Decimal Number 1508
20.8%

Most frequent character per category

Lowercase Letter
Value Count Frequency (%)
q 482
15.0%
p 430
13.3%
r 283
8.8%
x 283
8.8%
f 262
8.1%
s 220
6.8%
e 220
6.8%
y 220
6.8%
n 180
5.6%
i 180
5.6%
Other values (6) 463
14.4%
Uppercase Letter
Value Count Frequency (%)
O 442
17.5%
V 400
15.9%
H 360
14.3%
G 283
11.2%
N 262
10.4%
C 215
8.5%
I 180
7.1%
S 126
5.0%
L 110
4.4%
U 47
1.9%
Other values (2) 94
3.7%
Decimal Number
Value Count Frequency (%)
4 435
28.8%
6 395
26.2%
8 220
14.6%
2 215
14.3%
5 180
11.9%
9 63
4.2%

Most occurring scripts

Value Count Frequency (%)
Latin 5742
79.2%
Common 1508
20.8%

Most frequent character per script

Latin
Value Count Frequency (%)
q 482
8.4%
O 442
7.7%
p 430
7.5%
V 400
7.0%
H 360
6.3%
r 283
4.9%
x 283
4.9%
G 283
4.9%
f 262
4.6%
N 262
4.6%
Other values (18) 2255
39.3%
Common
Value Count Frequency (%)
4 435
28.8%
6 395
26.2%
8 220
14.6%
2 215
14.3%
5 180
11.9%
9 63
4.2%

Most occurring blocks

Value Count Frequency (%)
ASCII 7250
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
q 482
6.6%
O 442
6.1%
4 435
6.0%
p 430
5.9%
V 400
5.5%
6 395
5.4%
H 360
5.0%
r 283
3.9%
x 283
3.9%
G 283
3.9%
Other values (24) 3457
47.7%

closeTs
Categorical

HIGH CARDINALITY
MISSING
UNIFORM

Closing timestamp of the task (if the task had an accepted answer)

Distinct 269
Distinct (%) 100.0%
Missing 4531
Missing (%) 94.4%
Memory size 37.6 KiB
2021-12-06 13:23:03
1
2021-11-22 19:09:04
1
2021-11-22 19:37:43
1
2021-11-23 09:32:24
1
2021-11-23 00:51:41
1
Other values (264)
264

Length

Max length 19
Median length 19
Mean length 19
Min length 19

Characters and Unicode

Total characters 5111
Distinct characters 13
Distinct categories 4 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 269 ?
Unique (%) 100.0%

Sample

1st row 2021-11-22 15:04:33
2nd row 2021-11-23 09:03:31
3rd row 2021-11-22 11:19:27
4th row 2021-11-22 11:22:17
5th row 2021-11-22 13:34:13

Common Values

Value Count Frequency (%)
2021-12-06 13:23:03 1
< 0.1%
2021-11-22 19:09:04 1
< 0.1%
2021-11-22 19:37:43 1
< 0.1%
2021-11-23 09:32:24 1
< 0.1%
2021-11-23 00:51:41 1
< 0.1%
2021-11-22 16:13:46 1
< 0.1%
2021-12-06 22:55:50 1
< 0.1%
2021-12-04 10:48:31 1
< 0.1%
2021-11-24 13:58:32 1
< 0.1%
2021-12-03 12:53:52 1
< 0.1%
Other values (259) 259
5.4%
(Missing) 4531
94.4%

Length

2022-07-04T20:21:38.342394 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
Value Count Frequency (%)
2021-11-23 47
8.7%
2021-11-24 33
6.1%
2021-11-22 30
5.6%
2021-11-25 20
3.7%
2021-12-06 19
3.5%
2021-11-29 19
3.5%
2021-11-30 14
2.6%
2021-11-26 13
2.4%
2021-12-03 10
1.9%
2021-11-27 9
1.7%
Other values (285) 324
60.2%

Most occurring characters

Value Count Frequency (%)
1 1121
21.9%
2 1061
20.8%
0 578
11.3%
- 538
10.5%
: 538
10.5%
269
5.3%
3 267
5.2%
5 196
3.8%
4 185
3.6%
9 106
2.1%
Other values (3) 252
4.9%

Most occurring categories

Value Count Frequency (%)
Decimal Number 3766
73.7%
Dash Punctuation 538
10.5%
Other Punctuation 538
10.5%
Space Separator 269
5.3%

Most frequent character per category

Decimal Number
Value Count Frequency (%)
1 1121
29.8%
2 1061
28.2%
0 578
15.3%
3 267
7.1%
5 196
5.2%
4 185
4.9%
9 106
2.8%
6 99
2.6%
7 89
2.4%
8 64
1.7%
Dash Punctuation
Value Count Frequency (%)
- 538
100.0%
Other Punctuation
Value Count Frequency (%)
: 538
100.0%
Space Separator
Value Count Frequency (%)
269
100.0%

Most occurring scripts

Value Count Frequency (%)
Common 5111
100.0%

Most frequent character per script

Common
Value Count Frequency (%)
1 1121
21.9%
2 1061
20.8%
0 578
11.3%
- 538
10.5%
: 538
10.5%
269
5.3%
3 267
5.2%
5 196
3.8%
4 185
3.6%
9 106
2.1%
Other values (3) 252
4.9%

Most occurring blocks

Value Count Frequency (%)
ASCII 5111
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
1 1121
21.9%
2 1061
20.8%
0 578
11.3%
- 538
10.5%
: 538
10.5%
269
5.3%
3 267
5.2%
5 196
3.8%
4 185
3.6%
9 106
2.1%
Other values (3) 252
4.9%

communityId
Categorical

HIGH CORRELATION
MISSING

The Id of the users in the community

Distinct 5
Distinct (%) 0.7%
Missing 4075
Missing (%) 84.9%
Memory size 37.6 KiB
618e3ae7af7f96368a125dfb
220
618e4110af7f96368a125dfd
215
618e45a7af7f96368a125e12
180
618e4e2baf7f96368a125e28
63
618e4006af7f96368a125dfc
47

Length

Max length 24
Median length 24
Mean length 24
Min length 24

Characters and Unicode

Total characters 17400
Distinct characters 16
Distinct categories 2 ?
Distinct scripts 2 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row 618e45a7af7f96368a125e12
2nd row 618e45a7af7f96368a125e12
3rd row 618e45a7af7f96368a125e12
4th row 618e45a7af7f96368a125e12
5th row 618e45a7af7f96368a125e12

Common Values

Value Count Frequency (%)
618e3ae7af7f96368a125dfb 220
4.6%
618e4110af7f96368a125dfd 215
4.5%
618e45a7af7f96368a125e12 180
3.8%
618e4e2baf7f96368a125e28 63
1.3%
618e4006af7f96368a125dfc 47
1.0%
(Missing) 4075
84.9%

Length

2022-07-04T20:21:38.551927 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-04T20:21:38.801613 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Value Count Frequency (%)
618e3ae7af7f96368a125dfb 220
30.3%
618e4110af7f96368a125dfd 215
29.7%
618e45a7af7f96368a125e12 180
24.8%
618e4e2baf7f96368a125e28 63
8.7%
618e4006af7f96368a125dfc 47
6.5%

Most occurring characters

Value Count Frequency (%)
6 2222
12.8%
1 2060
11.8%
f 1932
11.1%
a 1850
10.6%
8 1513
8.7%
e 1251
7.2%
7 1125
6.5%
2 1031
5.9%
3 945
5.4%
5 905
5.2%
Other values (6) 2566
14.7%

Most occurring categories

Value Count Frequency (%)
Decimal Number 11340
65.2%
Lowercase Letter 6060
34.8%

Most frequent character per category

Decimal Number
Value Count Frequency (%)
6 2222
19.6%
1 2060
18.2%
8 1513
13.3%
7 1125
9.9%
2 1031
9.1%
3 945
8.3%
5 905
8.0%
9 725
6.4%
4 505
4.5%
0 309
2.7%
Lowercase Letter
Value Count Frequency (%)
f 1932
31.9%
a 1850
30.5%
e 1251
20.6%
d 697
11.5%
b 283
4.7%
c 47
0.8%

Most occurring scripts

Value Count Frequency (%)
Common 11340
65.2%
Latin 6060
34.8%

Most frequent character per script

Common
Value Count Frequency (%)
6 2222
19.6%
1 2060
18.2%
8 1513
13.3%
7 1125
9.9%
2 1031
9.1%
3 945
8.3%
5 905
8.0%
9 725
6.4%
4 505
4.5%
0 309
2.7%
Latin
Value Count Frequency (%)
f 1932
31.9%
a 1850
30.5%
e 1251
20.6%
d 697
11.5%
b 283
4.7%
c 47
0.8%

Most occurring blocks

Value Count Frequency (%)
ASCII 17400
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
6 2222
12.8%
1 2060
11.8%
f 1932
11.1%
a 1850
10.6%
8 1513
8.7%
e 1251
7.2%
7 1125
6.5%
2 1031
5.9%
3 945
5.4%
5 905
5.2%
Other values (6) 2566
14.7%

transactions.taskId
Categorical

HIGH CARDINALITY

The task’s id

Distinct 725
Distinct (%) 15.1%
Missing 0
Missing (%) 0.0%
Memory size 37.6 KiB
61b1cfbad550fb0168835c20
28
61a88464d550fb0168835b63
22
61b4fad8d550fb0168835c25
17
619f4c7659410d1c776de024
17
61afc01bd550fb0168835c1a
16
Other values (720)
4700

Length

Max length 24
Median length 24
Mean length 24
Min length 24

Characters and Unicode

Total characters 115200
Distinct characters 16
Distinct categories 2 ?
Distinct scripts 2 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 9 ?
Unique (%) 0.2%

Sample

1st row 619b9aa959410d1c776ddf15
2nd row 619b9aa959410d1c776ddf15
3rd row 619b9aa959410d1c776ddf15
4th row 619b9aa959410d1c776ddf15
5th row 619b9aa959410d1c776ddf15

Common Values

Value Count Frequency (%)
61b1cfbad550fb0168835c20 28
0.6%
61a88464d550fb0168835b63 22
0.5%
61b4fad8d550fb0168835c25 17
0.4%
619f4c7659410d1c776de024 17
0.4%
61afc01bd550fb0168835c1a 16
0.3%
619f8c3659410d1c776de037 16
0.3%
61afa27bd550fb0168835c18 16
0.3%
619f523759410d1c776de027 16
0.3%
61af85d1d550fb0168835c15 16
0.3%
61ac7f44d550fb0168835bd4 16
0.3%
Other values (715) 4620
96.2%

Length

2022-07-04T20:21:39.060829 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
Value Count Frequency (%)
61b1cfbad550fb0168835c20 28
0.6%
61a88464d550fb0168835b63 22
0.5%
61b4fad8d550fb0168835c25 17
0.4%
619f4c7659410d1c776de024 17
0.4%
61af85d1d550fb0168835c15 16
0.3%
61a3bb72305c9210fd8b8963 16
0.3%
619d0a6159410d1c776ddf7c 16
0.3%
61ac7f44d550fb0168835bd4 16
0.3%
619f523759410d1c776de027 16
0.3%
61afa27bd550fb0168835c18 16
0.3%
Other values (715) 4620
96.2%

Most occurring characters

Value Count Frequency (%)
1 14081
12.2%
d 11593
10.1%
6 10760
9.3%
5 9942
8.6%
0 9564
8.3%
9 8428
7.3%
7 7038
6.1%
8 6710
5.8%
f 6159
5.3%
c 6083
5.3%
Other values (6) 24842
21.6%

Most occurring categories

Value Count Frequency (%)
Decimal Number 78287
68.0%
Lowercase Letter 36913
32.0%

Most frequent character per category

Decimal Number
Value Count Frequency (%)
1 14081
18.0%
6 10760
13.7%
5 9942
12.7%
0 9564
12.2%
9 8428
10.8%
7 7038
9.0%
8 6710
8.6%
4 4471
5.7%
3 4401
5.6%
2 2892
3.7%
Lowercase Letter
Value Count Frequency (%)
d 11593
31.4%
f 6159
16.7%
c 6083
16.5%
b 6028
16.3%
a 4063
11.0%
e 2987
8.1%

Most occurring scripts

Value Count Frequency (%)
Common 78287
68.0%
Latin 36913
32.0%

Most frequent character per script

Common
Value Count Frequency (%)
1 14081
18.0%
6 10760
13.7%
5 9942
12.7%
0 9564
12.2%
9 8428
10.8%
7 7038
9.0%
8 6710
8.6%
4 4471
5.7%
3 4401
5.6%
2 2892
3.7%
Latin
Value Count Frequency (%)
d 11593
31.4%
f 6159
16.7%
c 6083
16.5%
b 6028
16.3%
a 4063
11.0%
e 2987
8.1%

Most occurring blocks

Value Count Frequency (%)
ASCII 115200
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
1 14081
12.2%
d 11593
10.1%
6 10760
9.3%
5 9942
8.6%
0 9564
8.3%
9 8428
7.3%
7 7038
6.1%
8 6710
5.8%
f 6159
5.3%
c 6083
5.3%
Other values (6) 24842
21.6%

transactions.label
Categorical

HIGH CORRELATION

Label of action on task (answerTransaction, bestAnswerTransaction, CREATE_TASK, moreAnswerTransaction, notAnswerTransaction, reportAnswerTransaction, reportQuestionTransaction)

Distinct 7
Distinct (%) 0.1%
Missing 0
Missing (%) 0.0%
Memory size 37.6 KiB
answerTransaction
2804
CREATE_TASK
725
notAnswerTransaction
622
moreAnswerTransaction
367
bestAnswerTransaction
273
Other values (2)
9

Length

Max length 25
Median length 17
Mean length 17.02958333
Min length 11

Characters and Unicode

Total characters 81742
Distinct characters 23
Distinct categories 3 ?
Distinct scripts 2 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row CREATE_TASK
2nd row answerTransaction
3rd row moreAnswerTransaction
4th row answerTransaction
5th row answerTransaction

Common Values

Value Count Frequency (%)
answerTransaction 2804
58.4%
CREATE_TASK 725
15.1%
notAnswerTransaction 622
13.0%
moreAnswerTransaction 367
7.6%
bestAnswerTransaction 273
5.7%
reportQuestionTransaction 6
0.1%
reportAnswerTransaction 3
0.1%

Length

2022-07-04T20:21:39.279647 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-04T20:21:39.539914 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Value Count Frequency (%)
answertransaction 2804
58.4%
create_task 725
15.1%
notanswertransaction 622
13.0%
moreanswertransaction 367
7.6%
bestanswertransaction 273
5.7%
reportquestiontransaction 6
0.1%
reportanswertransaction 3
0.1%

Most occurring characters

Value Count Frequency (%)
n 12847
15.7%
a 10954
13.4%
r 8529
10.4%
s 8423
10.3%
T 5525
6.8%
o 5079
6.2%
t 4985
6.1%
e 4724
5.8%
i 4081
5.0%
c 4075
5.0%
Other values (13) 12520
15.3%

Most occurring categories

Value Count Frequency (%)
Lowercase Letter 68421
83.7%
Uppercase Letter 12596
15.4%
Connector Punctuation 725
0.9%

Most frequent character per category

Lowercase Letter
Value Count Frequency (%)
n 12847
18.8%
a 10954
16.0%
r 8529
12.5%
s 8423
12.3%
o 5079
7.4%
t 4985
7.3%
e 4724
6.9%
i 4081
6.0%
c 4075
6.0%
w 4069
5.9%
Other values (4) 655
1.0%
Uppercase Letter
Value Count Frequency (%)
T 5525
43.9%
A 2715
21.6%
E 1450
11.5%
S 725
5.8%
K 725
5.8%
C 725
5.8%
R 725
5.8%
Q 6
< 0.1%
Connector Punctuation
Value Count Frequency (%)
_ 725
100.0%

Most occurring scripts

Value Count Frequency (%)
Latin 81017
99.1%
Common 725
0.9%

Most frequent character per script

Latin
Value Count Frequency (%)
n 12847
15.9%
a 10954
13.5%
r 8529
10.5%
s 8423
10.4%
T 5525
6.8%
o 5079
6.3%
t 4985
6.2%
e 4724
5.8%
i 4081
5.0%
c 4075
5.0%
Other values (12) 11795
14.6%
Common
Value Count Frequency (%)
_ 725
100.0%

Most occurring blocks

Value Count Frequency (%)
ASCII 81742
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
n 12847
15.7%
a 10954
13.4%
r 8529
10.4%
s 8423
10.3%
T 5525
6.8%
o 5079
6.2%
t 4985
6.1%
e 4724
5.8%
i 4081
5.0%
c 4075
5.0%
Other values (13) 12520
15.3%

transactions.creationTs
Categorical

HIGH CARDINALITY
UNIFORM

Creation timestamp of transaction

Distinct 4796
Distinct (%) 99.9%
Missing 0
Missing (%) 0.0%
Memory size 37.6 KiB
2021-11-24 14:36:01
2
2021-11-23 22:54:43
2
2021-11-25 11:42:31
2
2021-11-25 10:17:44
2
2021-11-22 10:27:07
1
Other values (4791)
4791

Length

Max length 19
Median length 19
Mean length 19
Min length 19

Characters and Unicode

Total characters 91200
Distinct characters 13
Distinct categories 4 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 4792 ?
Unique (%) 99.8%

Sample

1st row 2021-11-22 10:27:07
2nd row 2021-11-22 10:56:02
3rd row 2021-11-22 10:58:55
4th row 2021-11-22 11:01:40
5th row 2021-11-22 13:35:50

Common Values

Value Count Frequency (%)
2021-11-24 14:36:01 2
< 0.1%
2021-11-23 22:54:43 2
< 0.1%
2021-11-25 11:42:31 2
< 0.1%
2021-11-25 10:17:44 2
< 0.1%
2021-11-22 10:27:07 1
< 0.1%
2021-11-24 01:18:43 1
< 0.1%
2021-11-24 11:45:44 1
< 0.1%
2021-11-24 11:41:12 1
< 0.1%
2021-11-24 11:11:23 1
< 0.1%
2021-11-24 10:34:41 1
< 0.1%
Other values (4786) 4786
99.7%

Length

2022-07-04T20:21:39.803301 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
Value Count Frequency (%)
2021-11-23 870
9.1%
2021-11-25 467
4.9%
2021-11-24 437
4.6%
2021-11-22 431
4.5%
2021-12-02 320
3.3%
2021-11-29 238
2.5%
2021-12-03 234
2.4%
2021-11-26 233
2.4%
2021-12-06 221
2.3%
2021-11-27 208
2.2%
Other values (4638) 5941
61.9%

Most occurring characters

Value Count Frequency (%)
1 19427
21.3%
2 19360
21.2%
0 10299
11.3%
- 9600
10.5%
: 9600
10.5%
4800
5.3%
3 4598
5.0%
4 3548
3.9%
5 3394
3.7%
9 1696
1.9%
Other values (3) 4878
5.3%

Most occurring categories

Value Count Frequency (%)
Decimal Number 67200
73.7%
Dash Punctuation 9600
10.5%
Other Punctuation 9600
10.5%
Space Separator 4800
5.3%

Most frequent character per category

Decimal Number
Value Count Frequency (%)
1 19427
28.9%
2 19360
28.8%
0 10299
15.3%
3 4598
6.8%
4 3548
5.3%
5 3394
5.1%
9 1696
2.5%
7 1666
2.5%
6 1643
2.4%
8 1569
2.3%
Dash Punctuation
Value Count Frequency (%)
- 9600
100.0%
Other Punctuation
Value Count Frequency (%)
: 9600
100.0%
Space Separator
Value Count Frequency (%)
4800
100.0%

Most occurring scripts

Value Count Frequency (%)
Common 91200
100.0%

Most frequent character per script

Common
Value Count Frequency (%)
1 19427
21.3%
2 19360
21.2%
0 10299
11.3%
- 9600
10.5%
: 9600
10.5%
4800
5.3%
3 4598
5.0%
4 3548
3.9%
5 3394
3.7%
9 1696
1.9%
Other values (3) 4878
5.3%

Most occurring blocks

Value Count Frequency (%)
ASCII 91200
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
1 19427
21.3%
2 19360
21.2%
0 10299
11.3%
- 9600
10.5%
: 9600
10.5%
4800
5.3%
3 4598
5.0%
4 3548
3.9%
5 3394
3.7%
9 1696
1.9%
Other values (3) 4878
5.3%

transactions.actioneerId
Real number (ℝ ≥0 )

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

User id of transaction action

Distinct 151
Distinct (%) 3.1%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Mean 347.1710417
Minimum 5
Maximum 510
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 37.6 KiB
2022-07-04T20:21:40.266614 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum 5
5-th percentile 94
Q1 340
median 385
Q3 428
95-th percentile 494
Maximum 510
Range 505
Interquartile range (IQR) 88

Descriptive statistics

Standard deviation 124.0612767
Coefficient of variation (CV) 0.3573491501
Kurtosis 0.1556211411
Mean 347.1710417
Median Absolute Deviation (MAD) 43
Skewness -1.108932541
Sum 1666421
Variance 15391.20037
Monotonicity Not monotonic
2022-07-04T20:21:40.561222 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
347 249
5.2%
151 135
2.8%
426 126
2.6%
387 114
2.4%
357 114
2.4%
211 114
2.4%
404 110
2.3%
394 106
2.2%
94 103
2.1%
355 97
2.0%
Other values (141) 3532
73.6%
Value Count Frequency (%)
5 3
0.1%
8 2
< 0.1%
10 11
0.2%
14 1
< 0.1%
20 5
0.1%
22 66
1.4%
40 26
0.5%
49 20
0.4%
51 28
0.6%
81 28
0.6%
Value Count Frequency (%)
510 26
0.5%
509 4
0.1%
508 6
0.1%
507 26
0.5%
505 5
0.1%
504 30
0.6%
503 32
0.7%
501 16
0.3%
500 20
0.4%
499 14
0.3%

transactions.lastUpdateTs
Categorical

HIGH CARDINALITY
UNIFORM

Last update timestamp of transaction

Distinct 4784
Distinct (%) 99.7%
Missing 0
Missing (%) 0.0%
Memory size 37.6 KiB
2021-11-23 20:19:04
2
2021-12-02 20:42:47
2
2021-11-24 10:58:01
2
2021-11-23 10:31:55
2
2021-11-22 17:59:15
2
Other values (4779)
4790

Length

Max length 19
Median length 19
Mean length 19
Min length 19

Characters and Unicode

Total characters 91200
Distinct characters 13
Distinct categories 4 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 4768 ?
Unique (%) 99.3%

Sample

1st row 2021-11-22 10:27:15
2nd row 2021-11-22 10:56:05
3rd row 2021-11-22 10:58:55
4th row 2021-11-22 11:01:46
5th row 2021-11-22 13:35:53

Common Values

Value Count Frequency (%)
2021-11-23 20:19:04 2
< 0.1%
2021-12-02 20:42:47 2
< 0.1%
2021-11-24 10:58:01 2
< 0.1%
2021-11-23 10:31:55 2
< 0.1%
2021-11-22 17:59:15 2
< 0.1%
2021-12-04 01:51:46 2
< 0.1%
2021-11-23 14:53:18 2
< 0.1%
2021-12-03 14:49:59 2
< 0.1%
2021-12-06 15:34:11 2
< 0.1%
2021-11-23 19:36:44 2
< 0.1%
Other values (4774) 4780
99.6%

Length

2022-07-04T20:21:40.823973 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
Value Count Frequency (%)
2021-11-23 870
9.1%
2021-11-25 467
4.9%
2021-11-24 437
4.6%
2021-11-22 431
4.5%
2021-12-02 320
3.3%
2021-11-29 238
2.5%
2021-12-03 234
2.4%
2021-11-26 233
2.4%
2021-12-06 221
2.3%
2021-11-27 208
2.2%
Other values (4621) 5941
61.9%

Most occurring characters

Value Count Frequency (%)
1 19414
21.3%
2 19358
21.2%
0 10342
11.3%
- 9600
10.5%
: 9600
10.5%
4800
5.3%
3 4548
5.0%
5 3505
3.8%
4 3486
3.8%
9 1720
1.9%
Other values (3) 4827
5.3%

Most occurring categories

Value Count Frequency (%)
Decimal Number 67200
73.7%
Dash Punctuation 9600
10.5%
Other Punctuation 9600
10.5%
Space Separator 4800
5.3%

Most frequent character per category

Decimal Number
Value Count Frequency (%)
1 19414
28.9%
2 19358
28.8%
0 10342
15.4%
3 4548
6.8%
5 3505
5.2%
4 3486
5.2%
9 1720
2.6%
6 1658
2.5%
7 1608
2.4%
8 1561
2.3%
Dash Punctuation
Value Count Frequency (%)
- 9600
100.0%
Other Punctuation
Value Count Frequency (%)
: 9600
100.0%
Space Separator
Value Count Frequency (%)
4800
100.0%

Most occurring scripts

Value Count Frequency (%)
Common 91200
100.0%

Most frequent character per script

Common
Value Count Frequency (%)
1 19414
21.3%
2 19358
21.2%
0 10342
11.3%
- 9600
10.5%
: 9600
10.5%
4800
5.3%
3 4548
5.0%
5 3505
3.8%
4 3486
3.8%
9 1720
1.9%
Other values (3) 4827
5.3%

Most occurring blocks

Value Count Frequency (%)
ASCII 91200
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
1 19414
21.3%
2 19358
21.2%
0 10342
11.3%
- 9600
10.5%
: 9600
10.5%
4800
5.3%
3 4548
5.0%
5 3505
3.8%
4 3486
3.8%
9 1720
1.9%
Other values (3) 4827
5.3%

transactions.count.id
Real number (ℝ ≥0 )

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Count of follow-up action on task (Higher the number, the more actions were done on task)

Distinct 28
Distinct (%) 0.6%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Mean 3.663541667
Minimum 0
Maximum 27
Zeros 725
Zeros (%) 15.1%
Negative 0
Negative (%) 0.0%
Memory size 37.6 KiB
2022-07-04T20:21:41.047082 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum 0
5-th percentile 0
Q1 1
median 3
Q3 5
95-th percentile 10
Maximum 27
Range 27
Interquartile range (IQR) 4

Descriptive statistics

Standard deviation 3.292683226
Coefficient of variation (CV) 0.8987705137
Kurtosis 3.268259384
Mean 3.663541667
Median Absolute Deviation (MAD) 2
Skewness 1.417269215
Sum 17585
Variance 10.84176282
Monotonicity Not monotonic
2022-07-04T20:21:41.284805 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=28)
Value Count Frequency (%)
0 725
15.1%
1 716
14.9%
2 697
14.5%
3 604
12.6%
4 503
10.5%
5 415
8.6%
6 318
6.6%
7 238
5.0%
8 172
3.6%
9 120
2.5%
Other values (18) 292
6.1%
Value Count Frequency (%)
0 725
15.1%
1 716
14.9%
2 697
14.5%
3 604
12.6%
4 503
10.5%
5 415
8.6%
6 318
6.6%
7 238
5.0%
8 172
3.6%
9 120
2.5%
Value Count Frequency (%)
27 1
< 0.1%
26 1
< 0.1%
25 1
< 0.1%
24 1
< 0.1%
23 1
< 0.1%
22 1
< 0.1%
21 2
< 0.1%
20 2
< 0.1%
19 2
< 0.1%
18 2
< 0.1%

transactions.messages.appId
Categorical

HIGH CORRELATION
MISSING

App Id of the transaction message

Distinct 5
Distinct (%) 0.4%
Missing 3600
Missing (%) 75.0%
Memory size 37.6 KiB
sVq48ryxGe
384
2O4ppqNC6f
342
HnH5iaO6VI
233
GuS9StLxrv
158
LUOmwNXfZq
83

Length

Max length 10
Median length 10
Mean length 10
Min length 10

Characters and Unicode

Total characters 12000
Distinct characters 34
Distinct categories 3 ?
Distinct scripts 2 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row HnH5iaO6VI
2nd row HnH5iaO6VI
3rd row HnH5iaO6VI
4th row HnH5iaO6VI
5th row HnH5iaO6VI

Common Values

Value Count Frequency (%)
sVq48ryxGe 384
8.0%
2O4ppqNC6f 342
7.1%
HnH5iaO6VI 233
4.9%
GuS9StLxrv 158
3.3%
LUOmwNXfZq 83
1.7%
(Missing) 3600
75.0%

Length

2022-07-04T20:21:41.520200 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-04T20:21:41.763769 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Value Count Frequency (%)
svq48ryxge 384
32.0%
2o4ppqnc6f 342
28.5%
hnh5iao6vi 233
19.4%
gus9stlxrv 158
13.2%
luomwnxfzq 83
6.9%

Most occurring characters

Value Count Frequency (%)
q 809
6.7%
4 726
6.0%
p 684
5.7%
O 658
5.5%
V 617
5.1%
6 575
4.8%
r 542
4.5%
x 542
4.5%
G 542
4.5%
H 466
3.9%
Other values (24) 5839
48.7%

Most occurring categories

Value Count Frequency (%)
Lowercase Letter 5493
45.8%
Uppercase Letter 4089
34.1%
Decimal Number 2418
20.2%

Most frequent character per category

Lowercase Letter
Value Count Frequency (%)
q 809
14.7%
p 684
12.5%
r 542
9.9%
x 542
9.9%
f 425
7.7%
s 384
7.0%
e 384
7.0%
y 384
7.0%
n 233
4.2%
i 233
4.2%
Other values (6) 873
15.9%
Uppercase Letter
Value Count Frequency (%)
O 658
16.1%
V 617
15.1%
G 542
13.3%
H 466
11.4%
N 425
10.4%
C 342
8.4%
S 316
7.7%
L 241
5.9%
I 233
5.7%
U 83
2.0%
Other values (2) 166
4.1%
Decimal Number
Value Count Frequency (%)
4 726
30.0%
6 575
23.8%
8 384
15.9%
2 342
14.1%
5 233
9.6%
9 158
6.5%

Most occurring scripts

Value Count Frequency (%)
Latin 9582
79.8%
Common 2418
20.2%

Most frequent character per script

Latin
Value Count Frequency (%)
q 809
8.4%
p 684
7.1%
O 658
6.9%
V 617
6.4%
r 542
5.7%
x 542
5.7%
G 542
5.7%
H 466
4.9%
f 425
4.4%
N 425
4.4%
Other values (18) 3872
40.4%
Common
Value Count Frequency (%)
4 726
30.0%
6 575
23.8%
8 384
15.9%
2 342
14.1%
5 233
9.6%
9 158
6.5%

Most occurring blocks

Value Count Frequency (%)
ASCII 12000
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
q 809
6.7%
4 726
6.0%
p 684
5.7%
O 658
5.5%
V 617
5.1%
6 575
4.8%
r 542
4.5%
x 542
4.5%
G 542
4.5%
H 466
3.9%
Other values (24) 5839
48.7%

transactions.messages.receiverId
Categorical

HIGH CARDINALITY
MISSING
UNIFORM

User Id of the transaction message

Distinct 725
Distinct (%) 100.0%
Missing 4075
Missing (%) 84.9%
Memory size 37.6 KiB
380|374|162|1|367|395|164|85|94|369|356|192|361|107|8|357|357|357|357|357|357|357|357|357|357|357|357
1
378|94|367|357|164|365|361|192|421|374|85|356|395|1|146|409|409|409|409|409|409|409|409|409
1
1|146|192|395|409|380|8|162|94|361|369|164|374|430|107|357|357|357|357|357|357|357|357|357
1
8|192|164|162|107|94|374|395|357|365|378|380|361|369|430|434|409|146|85|421|356|367|1|434|434|434|434|434
1
430|431|374|357|1|85|94|107|365|8|434|380|146|367|378|421|421|421|421|430
1
Other values (720)
720

Length

Max length 246
Median length 167
Mean length 87.62758621
Min length 12

Characters and Unicode

Total characters 63530
Distinct characters 12
Distinct categories 3 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 725 ?
Unique (%) 100.0%

Sample

1st row 402|406|1|211|211|211|355
2nd row 211|1|411|402|20|406|410|410|410|410|428
3rd row 410|406|402|411|1|211|20|22|22|402
4th row 411|1|22|20|402|211|406|410|410|410|406
5th row 22|410|406|414|354|402|20|1|211|411|355|22

Common Values

Value Count Frequency (%)
380|374|162|1|367|395|164|85|94|369|356|192|361|107|8|357|357|357|357|357|357|357|357|357|357|357|357 1
< 0.1%
378|94|367|357|164|365|361|192|421|374|85|356|395|1|146|409|409|409|409|409|409|409|409|409 1
< 0.1%
1|146|192|395|409|380|8|162|94|361|369|164|374|430|107|357|357|357|357|357|357|357|357|357 1
< 0.1%
8|192|164|162|107|94|374|395|357|365|378|380|361|369|430|434|409|146|85|421|356|367|1|434|434|434|434|434 1
< 0.1%
430|431|374|357|1|85|94|107|365|8|434|380|146|367|378|421|421|421|421|430 1
< 0.1%
356|85|162|107|365|146|395|378|94|192|409|1|367|380|357|434|421|8|361|369|430|164|431|374|434|434|434|434|434|434 1
< 0.1%
94|367|162|361|370|85|1|192|8|365|107|434|395|164|380|147|147|147|147|147|147|147|147 1
< 0.1%
378|146|361|380|1|357|370|164|85|395|409|374|192|421|162|94|94|94|94 1
< 0.1%
107|431|370|361|8|430|162|356|369|367|1|421|164|357|94|436|436|436|436|436|436|436|436|436 1
< 0.1%
107|164|361|436|430|431|374|369|380|162|378|94|146|356|357|421|421|421|421|436 1
< 0.1%
Other values (715) 715
14.9%
(Missing) 4075
84.9%

Length

2022-07-04T20:21:42.061385 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
Value Count Frequency (%)
380|374|162|1|367|395|164|85|94|369|356|192|361|107|8|357|357|357|357|357|357|357|357|357|357|357|357 1
0.1%
428|433|354|22|20|426|420|452|447|423|418|427|410|198|411|355|355 1
0.1%
211|1|411|402|20|406|410|410|410|410|428 1
0.1%
410|406|402|411|1|211|20|22|22|402 1
0.1%
411|1|22|20|402|211|406|410|410|410|406 1
0.1%
22|410|406|414|354|402|20|1|211|411|355|22 1
0.1%
20|402|1|211|354|410|411|406|22|414|355|22 1
0.1%
1|20|411|355|22|354|402|211|410|406|414|420|420|420|420 1
0.1%
406|22|410|411|354|420|423|211|20|414|355|1|402|402|411 1
0.1%
406|354|410|428|22|355|418|198|427|411|423|414|20|420|1|211|211|211 1
0.1%
Other values (715) 715
98.6%

Most occurring characters

Value Count Frequency (%)
| 15502
24.4%
4 10572
16.6%
3 8531
13.4%
1 5793
9.1%
0 4199
6.6%
5 3466
5.5%
8 3162
5.0%
2 3153
5.0%
7 3002
4.7%
9 2804
4.4%
Other values (2) 3346
5.3%

Most occurring categories

Value Count Frequency (%)
Decimal Number 47107
74.1%
Math Symbol 15502
24.4%
Other Punctuation 921
1.4%

Most frequent character per category

Decimal Number
Value Count Frequency (%)
4 10572
22.4%
3 8531
18.1%
1 5793
12.3%
0 4199
8.9%
5 3466
7.4%
8 3162
6.7%
2 3153
6.7%
7 3002
6.4%
9 2804
6.0%
6 2425
5.1%
Math Symbol
Value Count Frequency (%)
| 15502
100.0%
Other Punctuation
Value Count Frequency (%)
. 921
100.0%

Most occurring scripts

Value Count Frequency (%)
Common 63530
100.0%

Most frequent character per script

Common
Value Count Frequency (%)
| 15502
24.4%
4 10572
16.6%
3 8531
13.4%
1 5793
9.1%
0 4199
6.6%
5 3466
5.5%
8 3162
5.0%
2 3153
5.0%
7 3002
4.7%
9 2804
4.4%
Other values (2) 3346
5.3%

Most occurring blocks

Value Count Frequency (%)
ASCII 63530
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
| 15502
24.4%
4 10572
16.6%
3 8531
13.4%
1 5793
9.1%
0 4199
6.6%
5 3466
5.5%
8 3162
5.0%
2 3153
5.0%
7 3002
4.7%
9 2804
4.4%
Other values (2) 3346
5.3%

transactions.messages.label
Categorical

HIGH CORRELATION
MISSING

Label of transaction message (AnsweredPickedMessage, AnsweredQuestionMessage, QuestionToAnswerMessage)

Distinct 3
Distinct (%) 0.1%
Missing 2419
Missing (%) 50.4%
Memory size 37.6 KiB
AnsweredQuestionMessage
1164
QuestionToAnswerMessage
948
AnsweredPickedMessage
269

Length

Max length 23
Median length 23
Mean length 22.77404452
Min length 21

Characters and Unicode

Total characters 54225
Distinct characters 19
Distinct categories 2 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row QuestionToAnswerMessage
2nd row AnsweredQuestionMessage
3rd row AnsweredQuestionMessage
4th row AnsweredPickedMessage
5th row QuestionToAnswerMessage

Common Values

Value Count Frequency (%)
AnsweredQuestionMessage 1164
24.2%
QuestionToAnswerMessage 948
19.8%
AnsweredPickedMessage 269
5.6%
(Missing) 2419
50.4%

Length

2022-07-04T20:21:42.341444 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-04T20:21:42.601088 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Value Count Frequency (%)
answeredquestionmessage 1164
48.9%
questiontoanswermessage 948
39.8%
answeredpickedmessage 269
11.3%

Most occurring characters

Value Count Frequency (%)
e 10957
20.2%
s 9255
17.1%
n 4493
8.3%
o 3060
5.6%
A 2381
4.4%
w 2381
4.4%
r 2381
4.4%
g 2381
4.4%
a 2381
4.4%
M 2381
4.4%
Other values (9) 12174
22.5%

Most occurring categories

Value Count Frequency (%)
Lowercase Letter 46134
85.1%
Uppercase Letter 8091
14.9%

Most frequent character per category

Lowercase Letter
Value Count Frequency (%)
e 10957
23.8%
s 9255
20.1%
n 4493
9.7%
o 3060
6.6%
w 2381
5.2%
r 2381
5.2%
g 2381
5.2%
a 2381
5.2%
i 2381
5.2%
t 2112
4.6%
Other values (4) 4352
9.4%
Uppercase Letter
Value Count Frequency (%)
A 2381
29.4%
M 2381
29.4%
Q 2112
26.1%
T 948
11.7%
P 269
3.3%

Most occurring scripts

Value Count Frequency (%)
Latin 54225
100.0%

Most frequent character per script

Latin
Value Count Frequency (%)
e 10957
20.2%
s 9255
17.1%
n 4493
8.3%
o 3060
5.6%
A 2381
4.4%
w 2381
4.4%
r 2381
4.4%
g 2381
4.4%
a 2381
4.4%
M 2381
4.4%
Other values (9) 12174
22.5%

Most occurring blocks

Value Count Frequency (%)
ASCII 54225
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
e 10957
20.2%
s 9255
17.1%
n 4493
8.3%
o 3060
5.6%
A 2381
4.4%
w 2381
4.4%
r 2381
4.4%
g 2381
4.4%
a 2381
4.4%
M 2381
4.4%
Other values (9) 12174
22.5%

transactions.messages.attributes.taskId
Categorical

HIGH CARDINALITY
MISSING
UNIFORM

taskId of the transaction message

Distinct 725
Distinct (%) 60.4%
Missing 3600
Missing (%) 75.0%
Memory size 37.6 KiB
61b1cfbad550fb0168835c20
7
619b73b059410d1c776ddef4
6
61af8860d550fb0168835c16
6
61afc01bd550fb0168835c1a
5
61ae8af4d550fb0168835bf3
5
Other values (720)
1171

Length

Max length 24
Median length 24
Mean length 24
Min length 24

Characters and Unicode

Total characters 28800
Distinct characters 16
Distinct categories 2 ?
Distinct scripts 2 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 396 ?
Unique (%) 33.0%

Sample

1st row 619b9aa959410d1c776ddf15
2nd row 619b9aa959410d1c776ddf15
3rd row 619ba35b59410d1c776ddf19
4th row 619ba62f59410d1c776ddf1a
5th row 619ba6f659410d1c776ddf1b

Common Values

Value Count Frequency (%)
61b1cfbad550fb0168835c20 7
0.1%
619b73b059410d1c776ddef4 6
0.1%
61af8860d550fb0168835c16 6
0.1%
61afc01bd550fb0168835c1a 5
0.1%
61ae8af4d550fb0168835bf3 5
0.1%
61adf98dd550fb0168835be6 5
0.1%
619b84d259410d1c776ddf08 5
0.1%
61a3bb72305c9210fd8b8963 5
0.1%
619b7ebf59410d1c776ddf02 5
0.1%
619b756a59410d1c776ddef6 5
0.1%
Other values (715) 1146
23.9%
(Missing) 3600
75.0%

Length

2022-07-04T20:21:42.807575 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
Value Count Frequency (%)
61b1cfbad550fb0168835c20 7
0.6%
61af8860d550fb0168835c16 6
0.5%
619b73b059410d1c776ddef4 6
0.5%
619b756a59410d1c776ddef6 5
0.4%
619b713759410d1c776ddef0 5
0.4%
619d291a59410d1c776ddf8e 5
0.4%
619f8c3659410d1c776de037 5
0.4%
619c631859410d1c776ddf43 5
0.4%
619b7d8859410d1c776ddf01 5
0.4%
61ae0c3fd550fb0168835be9 5
0.4%
Other values (715) 1146
95.5%

Most occurring characters

Value Count Frequency (%)
1 3539
12.3%
d 2878
10.0%
6 2682
9.3%
5 2510
8.7%
0 2416
8.4%
9 2071
7.2%
7 1757
6.1%
8 1682
5.8%
f 1552
5.4%
b 1540
5.3%
Other values (6) 6173
21.4%

Most occurring categories

Value Count Frequency (%)
Decimal Number 19542
67.9%
Lowercase Letter 9258
32.1%

Most frequent character per category

Decimal Number
Value Count Frequency (%)
1 3539
18.1%
6 2682
13.7%
5 2510
12.8%
0 2416
12.4%
9 2071
10.6%
7 1757
9.0%
8 1682
8.6%
3 1102
5.6%
4 1095
5.6%
2 688
3.5%
Lowercase Letter
Value Count Frequency (%)
d 2878
31.1%
f 1552
16.8%
b 1540
16.6%
c 1505
16.3%
a 1000
10.8%
e 783
8.5%

Most occurring scripts

Value Count Frequency (%)
Common 19542
67.9%
Latin 9258
32.1%

Most frequent character per script

Common
Value Count Frequency (%)
1 3539
18.1%
6 2682
13.7%
5 2510
12.8%
0 2416
12.4%
9 2071
10.6%
7 1757
9.0%
8 1682
8.6%
3 1102
5.6%
4 1095
5.6%
2 688
3.5%
Latin
Value Count Frequency (%)
d 2878
31.1%
f 1552
16.8%
b 1540
16.6%
c 1505
16.3%
a 1000
10.8%
e 783
8.5%

Most occurring blocks

Value Count Frequency (%)
ASCII 28800
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
1 3539
12.3%
d 2878
10.0%
6 2682
9.3%
5 2510
8.7%
0 2416
8.4%
9 2071
7.2%
7 1757
6.1%
8 1682
5.8%
f 1552
5.4%
b 1540
5.3%
Other values (6) 6173
21.4%

transactions.messages.attributes.question
Categorical

HIGH CARDINALITY
MISSING

Question text of the transaction

Distinct 688
Distinct (%) 17.1%
Missing 777
Missing (%) 16.2%
Memory size 37.6 KiB
What country are you from?
20
Та хэд энэ 3р тунгийн талаар ямар бодолтой байна.
20
Cant stop, wont stop ___ ?
17
What's your favorite animal? and three deep reasons why
17
Mac or Windows - why?
16
Other values (683)
3933

Length

Max length 1392
Median length 204
Mean length 71.40840169
Min length 3

Characters and Unicode

Total characters 287276
Distinct characters 218
Distinct categories 16 ?
Distinct scripts 4 ?
Distinct blocks 10 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 23 ?
Unique (%) 0.6%

Sample

1st row Hola a todos, quería saber quienes participan en el chat box por primera vez y quienes somos los que disfrutamos del primer experimento y quisimos volver
2nd row Hola a todos, quería saber quienes participan en el chat box por primera vez y quienes somos los que disfrutamos del primer experimento y quisimos volver
3rd row Hola a todos, quería saber quienes participan en el chat box por primera vez y quienes somos los que disfrutamos del primer experimento y quisimos volver
4th row Hola a todos, quería saber quienes participan en el chat box por primera vez y quienes somos los que disfrutamos del primer experimento y quisimos volver
5th row Hola a todos, quería saber quienes participan en el chat box por primera vez y quienes somos los que disfrutamos del primer experimento y quisimos volver

Common Values

Value Count Frequency (%)
What country are you from? 20
0.4%
Та хэд энэ 3р тунгийн талаар ямар бодолтой байна. 20
0.4%
Cant stop, wont stop ___ ? 17
0.4%
What's your favorite animal? and three deep reasons why 17
0.4%
Mac or Windows - why? 16
0.3%
Mny ner hen gj ochij bn? 16
0.3%
How do you motivator yourself to cleaning your room? 16
0.3%
Why did you say yes to participating in this chatbot project? 15
0.3%
¿Qué estación del año prefieren más? 14
0.3%
Сонсох бүрт амар амгалан аз жаргалыг мэдэрдэг дуунуудаасаа санал болгооч ??? 14
0.3%
Other values (678) 3858
80.4%
(Missing) 777
16.2%

Length

2022-07-04T20:21:43.090240 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
Value Count Frequency (%)
you 1001
2.1%
the 563
1.2%
a 514
1.1%
in 470
1.0%
do 454
0.9%
байна 434
0.9%
is 418
0.9%
to 412
0.8%
what 406
0.8%
405
0.8%
Other values (3610) 43646
89.6%

Most occurring characters

Value Count Frequency (%)
44676
15.6%
e 16466
5.7%
a 16276
5.7%
o 13163
4.6%
i 10820
3.8%
а 10495
3.7%
t 10275
3.6%
n 10092
3.5%
r 9113
3.2%
s 8646
3.0%
Other values (208) 137254
47.8%

Most occurring categories

Value Count Frequency (%)
Lowercase Letter 224334
78.1%
Space Separator 44676
15.6%
Other Punctuation 7818
2.7%
Uppercase Letter 7375
2.6%
Connector Punctuation 688
0.2%
Decimal Number 628
0.2%
Other Symbol 550
0.2%
Close Punctuation 371
0.1%
Control 260
0.1%
Open Punctuation 260
0.1%
Other values (6) 316
0.1%

Most frequent character per category

Lowercase Letter
Value Count Frequency (%)
e 16466
7.3%
a 16276
7.3%
o 13163
5.9%
i 10820
4.8%
а 10495
4.7%
t 10275
4.6%
n 10092
4.5%
r 9113
4.1%
s 8646
3.9%
u 7878
3.5%
Other values (61) 111110
49.5%
Uppercase Letter
Value Count Frequency (%)
W 748
10.1%
C 657
8.9%
A 476
6.5%
D 427
5.8%
H 374
5.1%
Q 349
4.7%
I 330
4.5%
P 259
3.5%
T 242
3.3%
S 226
3.1%
Other values (48) 3287
44.6%
Other Symbol
Value Count Frequency (%)
😂 124
22.5%
👀 44
8.0%
😊 44
8.0%
😪 24
4.4%
🤗 24
4.4%
🔥 23
4.2%
🙏 14
2.5%
🥺 12
2.2%
🤣 12
2.2%
😥 12
2.2%
Other values (37) 217
39.5%
Other Punctuation
Value Count Frequency (%)
? 4061
51.9%
. 1139
14.6%
, 1028
13.1%
: 606
7.8%
¿ 301
3.9%
' 256
3.3%
/ 181
2.3%
! 98
1.3%
" 78
1.0%
; 34
0.4%
Other values (6) 36
0.5%
Decimal Number
Value Count Frequency (%)
1 219
34.9%
2 140
22.3%
0 102
16.2%
3 66
10.5%
4 31
4.9%
7 25
4.0%
5 22
3.5%
8 10
1.6%
6 9
1.4%
9 4
0.6%
Close Punctuation
Value Count Frequency (%)
) 357
96.2%
] 14
3.8%
Open Punctuation
Value Count Frequency (%)
( 246
94.6%
[ 14
5.4%
Final Punctuation
Value Count Frequency (%)
45
73.8%
16
26.2%
Math Symbol
Value Count Frequency (%)
> 22
66.7%
~ 11
33.3%
Modifier Symbol
Value Count Frequency (%)
🏻 10
62.5%
^ 6
37.5%
Space Separator
Value Count Frequency (%)
44676
100.0%
Connector Punctuation
Value Count Frequency (%)
_ 688
100.0%
Control
Value Count Frequency (%)
260
100.0%
Dash Punctuation
Value Count Frequency (%)
- 187
100.0%
Initial Punctuation
Value Count Frequency (%)
10
100.0%
Nonspacing Mark
Value Count Frequency (%)
9
100.0%

Most occurring scripts

Value Count Frequency (%)
Latin 157939
55.0%
Cyrillic 73770
25.7%
Common 55558
19.3%
Inherited 9
< 0.1%

Most frequent character per script

Common
Value Count Frequency (%)
44676
80.4%
? 4061
7.3%
. 1139
2.1%
, 1028
1.9%
_ 688
1.2%
: 606
1.1%
) 357
0.6%
¿ 301
0.5%
260
0.5%
' 256
0.5%
Other values (78) 2186
3.9%
Latin
Value Count Frequency (%)
e 16466
10.4%
a 16276
10.3%
o 13163
8.3%
i 10820
6.9%
t 10275
6.5%
n 10092
6.4%
r 9113
5.8%
s 8646
5.5%
u 7878
5.0%
h 6921
4.4%
Other values (56) 48289
30.6%
Cyrillic
Value Count Frequency (%)
а 10495
14.2%
э 5747
7.8%
н 4878
6.6%
г 3975
5.4%
й 3973
5.4%
р 3685
5.0%
д 3676
5.0%
л 3632
4.9%
х 3613
4.9%
о 3329
4.5%
Other values (53) 26767
36.3%
Inherited
Value Count Frequency (%)
9
100.0%

Most occurring blocks

Value Count Frequency (%)
ASCII 211466
73.6%
Cyrillic 73770
25.7%
None 1600
0.6%
Emoticons 329
0.1%
Punctuation 71
< 0.1%
IPA Ext 12
< 0.1%
VS 9
< 0.1%
Dingbats 9
< 0.1%
Enclosed Alphanum Sup 6
< 0.1%
Misc Symbols 4
< 0.1%

Most frequent character per block

ASCII
Value Count Frequency (%)
44676
21.1%
e 16466
7.8%
a 16276
7.7%
o 13163
6.2%
i 10820
5.1%
t 10275
4.9%
n 10092
4.8%
r 9113
4.3%
s 8646
4.1%
u 7878
3.7%
Other values (76) 64061
30.3%
Cyrillic
Value Count Frequency (%)
а 10495
14.2%
э 5747
7.8%
н 4878
6.6%
г 3975
5.4%
й 3973
5.4%
р 3685
5.0%
д 3676
5.0%
л 3632
4.9%
х 3613
4.9%
о 3329
4.5%
Other values (53) 26767
36.3%
None
Value Count Frequency (%)
¿ 301
18.8%
é 295
18.4%
á 220
13.8%
í 116
7.2%
ó 111
6.9%
ñ 107
6.7%
è 91
5.7%
👀 44
2.8%
ú 42
2.6%
ì 25
1.6%
Other values (26) 248
15.5%
Emoticons
Value Count Frequency (%)
😂 124
37.7%
😊 44
13.4%
😪 24
7.3%
🙏 14
4.3%
😥 12
3.6%
😌 12
3.6%
😇 10
3.0%
😐 8
2.4%
🙈 8
2.4%
🙄 8
2.4%
Other values (12) 65
19.8%
Punctuation
Value Count Frequency (%)
45
63.4%
16
22.5%
10
14.1%
IPA Ext
Value Count Frequency (%)
ə 12
100.0%
VS
Value Count Frequency (%)
9
100.0%
Misc Symbols
Value Count Frequency (%)
4
100.0%
Dingbats
Value Count Frequency (%)
4
44.4%
3
33.3%
2
22.2%
Enclosed Alphanum Sup
Value Count Frequency (%)
🇰 3
50.0%
🇩 3
50.0%

transactions.messages.attributes.userId
Real number (ℝ ≥0 )

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING

User Id of the person asking the question

Distinct 150
Distinct (%) 4.0%
Missing 1066
Missing (%) 22.2%
Infinite 0
Infinite (%) 0.0%
Mean 338.7911087
Minimum 5
Maximum 510
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 37.6 KiB
2022-07-04T20:21:43.405758 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum 5
5-th percentile 94
Q1 330
median 381
Q3 426
95-th percentile 482
Maximum 510
Range 505
Interquartile range (IQR) 96

Descriptive statistics

Standard deviation 126.2933445
Coefficient of variation (CV) 0.3727764431
Kurtosis -0.144247945
Mean 338.7911087
Median Absolute Deviation (MAD) 47
Skewness -1.030884342
Sum 1265046
Variance 15950.00886
Monotonicity Not monotonic
2022-07-04T20:21:43.696557 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
347 192
4.0%
151 133
2.8%
94 96
2.0%
357 96
2.0%
211 92
1.9%
426 88
1.8%
394 85
1.8%
433 81
1.7%
355 80
1.7%
404 77
1.6%
Other values (140) 2714
56.5%
(Missing) 1066
22.2%
Value Count Frequency (%)
5 2
< 0.1%
8 1
< 0.1%
10 9
0.2%
14 1
< 0.1%
20 5
0.1%
22 65
1.4%
40 21
0.4%
49 15
0.3%
51 11
0.2%
81 18
0.4%
Value Count Frequency (%)
510 16
0.3%
509 3
0.1%
508 5
0.1%
507 9
0.2%
505 4
0.1%
504 20
0.4%
503 10
0.2%
501 9
0.2%
500 8
0.2%
499 9
0.2%

transactions.messages.attributes.anonymous
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING

Is the question anonymous?

Distinct 2
Distinct (%) 0.1%
Missing 3398
Missing (%) 70.8%
Memory size 37.6 KiB
0.0
1129
1.0
273

Length

Max length 3
Median length 3
Mean length 3
Min length 3

Characters and Unicode

Total characters 4206
Distinct characters 3
Distinct categories 2 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row 0.0
2nd row 0.0
3rd row 0.0
4th row 0.0
5th row 0.0

Common Values

Value Count Frequency (%)
0.0 1129
23.5%
1.0 273
5.7%
(Missing) 3398
70.8%

Length

2022-07-04T20:21:43.952553 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-04T20:21:44.173869 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Value Count Frequency (%)
0.0 1129
80.5%
1.0 273
19.5%

Most occurring characters

Value Count Frequency (%)
0 2531
60.2%
. 1402
33.3%
1 273
6.5%

Most occurring categories

Value Count Frequency (%)
Decimal Number 2804
66.7%
Other Punctuation 1402
33.3%

Most frequent character per category

Decimal Number
Value Count Frequency (%)
0 2531
90.3%
1 273
9.7%
Other Punctuation
Value Count Frequency (%)
. 1402
100.0%

Most occurring scripts

Value Count Frequency (%)
Common 4206
100.0%

Most frequent character per script

Common
Value Count Frequency (%)
0 2531
60.2%
. 1402
33.3%
1 273
6.5%

Most occurring blocks

Value Count Frequency (%)
ASCII 4206
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
0 2531
60.2%
. 1402
33.3%
1 273
6.5%

transactions.messages.attributes.sensitive
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING

Is the question sensitive?

Distinct 2
Distinct (%) 0.2%
Missing 3852
Missing (%) 80.2%
Memory size 37.6 KiB
0.0
764
1.0
184

Length

Max length 3
Median length 3
Mean length 3
Min length 3

Characters and Unicode

Total characters 2844
Distinct characters 3
Distinct categories 2 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row 0.0
2nd row 0.0
3rd row 0.0
4th row 0.0
5th row 0.0

Common Values

Value Count Frequency (%)
0.0 764
15.9%
1.0 184
3.8%
(Missing) 3852
80.2%

Length

2022-07-04T20:21:44.358182 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-04T20:21:44.565751 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Value Count Frequency (%)
0.0 764
80.6%
1.0 184
19.4%

Most occurring characters

Value Count Frequency (%)
0 1712
60.2%
. 948
33.3%
1 184
6.5%

Most occurring categories

Value Count Frequency (%)
Decimal Number 1896
66.7%
Other Punctuation 948
33.3%

Most frequent character per category

Decimal Number
Value Count Frequency (%)
0 1712
90.3%
1 184
9.7%
Other Punctuation
Value Count Frequency (%)
. 948
100.0%

Most occurring scripts

Value Count Frequency (%)
Common 2844
100.0%

Most frequent character per script

Common
Value Count Frequency (%)
0 1712
60.2%
. 948
33.3%
1 184
6.5%

Most occurring blocks

Value Count Frequency (%)
ASCII 2844
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
0 1712
60.2%
. 948
33.3%
1 184
6.5%

transactions.messages.attributes.positionOfAnswerer
Categorical

HIGH CORRELATION
MISSING

Physical proximity of questioner

Distinct 2
Distinct (%) 0.2%
Missing 3852
Missing (%) 80.2%
Memory size 37.6 KiB
anywhere
800
nearby
148

Length

Max length 8
Median length 8
Mean length 7.687763713
Min length 6

Characters and Unicode

Total characters 7288
Distinct characters 8
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row anywhere
2nd row anywhere
3rd row anywhere
4th row anywhere
5th row anywhere

Common Values

Value Count Frequency (%)
anywhere 800
16.7%
nearby 148
3.1%
(Missing) 3852
80.2%

Length

2022-07-04T20:21:44.760401 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-04T20:21:44.991009 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Value Count Frequency (%)
anywhere 800
84.4%
nearby 148
15.6%

Most occurring characters

Value Count Frequency (%)
e 1748
24.0%
a 948
13.0%
n 948
13.0%
y 948
13.0%
r 948
13.0%
w 800
11.0%
h 800
11.0%
b 148
2.0%

Most occurring categories

Value Count Frequency (%)
Lowercase Letter 7288
100.0%

Most frequent character per category

Lowercase Letter
Value Count Frequency (%)
e 1748
24.0%
a 948
13.0%
n 948
13.0%
y 948
13.0%
r 948
13.0%
w 800
11.0%
h 800
11.0%
b 148
2.0%

Most occurring scripts

Value Count Frequency (%)
Latin 7288
100.0%

Most frequent character per script

Latin
Value Count Frequency (%)
e 1748
24.0%
a 948
13.0%
n 948
13.0%
y 948
13.0%
r 948
13.0%
w 800
11.0%
h 800
11.0%
b 148
2.0%

Most occurring blocks

Value Count Frequency (%)
ASCII 7288
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
e 1748
24.0%
a 948
13.0%
n 948
13.0%
y 948
13.0%
r 948
13.0%
w 800
11.0%
h 800
11.0%
b 148
2.0%

transactions.messages.attributes.transactionId
Real number (ℝ ≥0 )

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING

Id of the transaction

Distinct 23
Distinct (%) 0.7%
Missing 1730
Missing (%) 36.0%
Infinite 0
Infinite (%) 0.0%
Mean 4.18990228
Minimum 1
Maximum 26
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 37.6 KiB
2022-07-04T20:21:45.167890 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum 1
5-th percentile 1
Q1 2
median 3
Q3 6
95-th percentile 10
Maximum 26
Range 25
Interquartile range (IQR) 4

Descriptive statistics

Standard deviation 3.069536619
Coefficient of variation (CV) 0.7326033912
Kurtosis 2.86150426
Mean 4.18990228
Median Absolute Deviation (MAD) 2
Skewness 1.381259449
Sum 12863
Variance 9.422055057
Monotonicity Not monotonic
2022-07-04T20:21:45.391073 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=23)
Value Count Frequency (%)
1 640
13.3%
2 475
9.9%
3 431
9.0%
4 361
7.5%
5 337
7.0%
6 235
4.9%
7 172
3.6%
8 130
2.7%
9 87
1.8%
10 57
1.2%
Other values (13) 145
3.0%
(Missing) 1730
36.0%
Value Count Frequency (%)
1 640
13.3%
2 475
9.9%
3 431
9.0%
4 361
7.5%
5 337
7.0%
6 235
4.9%
7 172
3.6%
8 130
2.7%
9 87
1.8%
10 57
1.2%
Value Count Frequency (%)
26 1
< 0.1%
25 1
< 0.1%
21 1
< 0.1%
20 1
< 0.1%
19 1
< 0.1%
18 1
< 0.1%
17 1
< 0.1%
16 3
0.1%
15 5
0.1%
14 17
0.4%

transactions.messages.attributes.answer
Categorical

HIGH CARDINALITY
MISSING
UNIFORM

Answer on a question

Distinct 2695
Distinct (%) 96.4%
Missing 2003
Missing (%) 41.7%
Memory size 37.6 KiB
B
7
Yo
6
Si
5
No
5
❤️
5
Other values (2690)
2769

Length

Max length 1122
Median length 471
Mean length 86.34823025
Min length 1

Characters and Unicode

Total characters 241516
Distinct characters 334
Distinct categories 19 ?
Distinct scripts 6 ?
Distinct blocks 12 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 2632 ?
Unique (%) 94.1%

Sample

1st row Hola! Para mi es la primera vez que participo!
2nd row Primera vez!
3rd row Participo por primera vez
4th row 1) conseguir un grupo de estudio, con la virtualidad es muy facil distraerse y un grupo puede ser presión util. 2)dar las clases en una habitación diferente a la que usas la computadora como recreación de ser posible 3) aprovechar los espacios para preguntar, ya sea en clase o por correo, preguntar fuera de horario con la virtualidad no esta tan mal visto
5th row Una pregunta complicada diria yo, pero a mi me sirve (creo) no dejar para despues nada, hacer todo a tiempo y no todo apurado. Otra cosa tambien importante es, establecer tiempos, decir que de tal a tal hora vas a estudiar, y cumplirlar para despues tener tiempo de descanso 😁👌

Common Values

Value Count Frequency (%)
B 7
0.1%
Yo 6
0.1%
Si 5
0.1%
No 5
0.1%
❤️ 5
0.1%
Sin piña 4
0.1%
Noche 4
0.1%
Yes 4
0.1%
Ok 4
0.1%
Atardecer 4
0.1%
Other values (2685) 2749
57.3%
(Missing) 2003
41.7%

Length

2022-07-04T20:21:45.885494 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
Value Count Frequency (%)
a 691
1.6%
i 653
1.5%
the 624
1.5%
to 504
1.2%
and 474
1.1%
it 348
0.8%
332
0.8%
of 300
0.7%
you 274
0.6%
нь 274
0.6%
Other values (11818) 37887
89.4%

Most occurring characters

Value Count Frequency (%)
39430
16.3%
e 15181
6.3%
a 12390
5.1%
o 11187
4.6%
t 9733
4.0%
i 9436
3.9%
n 8716
3.6%
а 8094
3.4%
s 7991
3.3%
r 7668
3.2%
Other values (324) 111690
46.2%

Most occurring categories

Value Count Frequency (%)
Lowercase Letter 187990
77.8%
Space Separator 39441
16.3%
Uppercase Letter 6201
2.6%
Other Punctuation 4948
2.0%
Decimal Number 951
0.4%
Other Symbol 575
0.2%
Close Punctuation 306
0.1%
Control 248
0.1%
Dash Punctuation 246
0.1%
Open Punctuation 229
0.1%
Other values (9) 381
0.2%

Most frequent character per category

Other Symbol
Value Count Frequency (%)
😂 90
15.7%
😅 39
6.8%
😊 33
5.7%
😁 21
3.7%
😉 20
3.5%
20
3.5%
👌 15
2.6%
😄 13
2.3%
😍 11
1.9%
10
1.7%
Other values (133) 303
52.7%
Lowercase Letter
Value Count Frequency (%)
e 15181
8.1%
a 12390
6.6%
o 11187
6.0%
t 9733
5.2%
i 9436
5.0%
n 8716
4.6%
а 8094
4.3%
s 7991
4.3%
r 7668
4.1%
l 6011
3.2%
Other values (67) 91583
48.7%
Uppercase Letter
Value Count Frequency (%)
I 943
15.2%
T 347
5.6%
S 335
5.4%
A 319
5.1%
N 238
3.8%
P 237
3.8%
M 236
3.8%
C 236
3.8%
B 227
3.7%
D 227
3.7%
Other values (48) 2856
46.1%
Other Punctuation
Value Count Frequency (%)
. 2184
44.1%
, 1543
31.2%
' 329
6.6%
: 309
6.2%
! 182
3.7%
? 112
2.3%
/ 111
2.2%
" 103
2.1%
% 19
0.4%
; 17
0.3%
Other values (6) 39
0.8%
Decimal Number
Value Count Frequency (%)
1 185
19.5%
0 178
18.7%
2 162
17.0%
3 128
13.5%
4 70
7.4%
5 66
6.9%
7 53
5.6%
6 43
4.5%
9 38
4.0%
8 28
2.9%
Math Symbol
Value Count Frequency (%)
+ 12
37.5%
= 9
28.1%
> 7
21.9%
~ 3
9.4%
< 1
3.1%
Final Punctuation
Value Count Frequency (%)
102
82.3%
21
16.9%
» 1
0.8%
Initial Punctuation
Value Count Frequency (%)
21
87.5%
2
8.3%
« 1
4.2%
Modifier Symbol
Value Count Frequency (%)
🏻 16
51.6%
^ 10
32.3%
🏼 5
16.1%
Other Letter
Value Count Frequency (%)
1
33.3%
1
33.3%
1
33.3%
Space Separator
Value Count Frequency (%)
39430
> 99.9%
11
< 0.1%
Close Punctuation
Value Count Frequency (%)
) 277
90.5%
] 29
9.5%
Open Punctuation
Value Count Frequency (%)
( 200
87.3%
[ 29
12.7%
Format
Value Count Frequency (%)
13
92.9%
1
7.1%
Control
Value Count Frequency (%)
248
100.0%
Dash Punctuation
Value Count Frequency (%)
- 246
100.0%
Connector Punctuation
Value Count Frequency (%)
_ 100
100.0%
Nonspacing Mark
Value Count Frequency (%)
51
100.0%
Currency Symbol
Value Count Frequency (%)
£ 2
100.0%

Most occurring scripts

Value Count Frequency (%)
Latin 134114
55.5%
Cyrillic 60077
24.9%
Common 47258
19.6%
Inherited 64
< 0.1%
Hiragana 2
< 0.1%
Han 1
< 0.1%

Most frequent character per script

Common
Value Count Frequency (%)
39430
83.4%
. 2184
4.6%
, 1543
3.3%
' 329
0.7%
: 309
0.7%
) 277
0.6%
248
0.5%
- 246
0.5%
( 200
0.4%
1 185
0.4%
Other values (184) 2307
4.9%
Latin
Value Count Frequency (%)
e 15181
11.3%
a 12390
9.2%
o 11187
8.3%
t 9733
7.3%
i 9436
7.0%
n 8716
6.5%
s 7991
6.0%
r 7668
5.7%
l 6011
4.5%
u 4898
3.7%
Other values (60) 40903
30.5%
Cyrillic
Value Count Frequency (%)
а 8094
13.5%
э 5321
8.9%
г 3769
6.3%
н 3617
6.0%
й 3246
5.4%
х 3237
5.4%
д 3184
5.3%
л 3035
5.1%
о 2968
4.9%
р 2739
4.6%
Other values (55) 20867
34.7%
Inherited
Value Count Frequency (%)
51
79.7%
13
20.3%
Hiragana
Value Count Frequency (%)
1
50.0%
1
50.0%
Han
Value Count Frequency (%)
1
100.0%

Most occurring blocks

Value Count Frequency (%)
ASCII 180056
74.6%
Cyrillic 60077
24.9%
None 750
0.3%
Emoticons 322
0.1%
Punctuation 173
0.1%
VS 51
< 0.1%
Dingbats 36
< 0.1%
Misc Symbols 28
< 0.1%
Enclosed Alphanum Sup 18
< 0.1%
IPA Ext 2
< 0.1%
Other values (2) 3
< 0.1%

Most frequent character per block

ASCII
Value Count Frequency (%)
39430
21.9%
e 15181
8.4%
a 12390
6.9%
o 11187
6.2%
t 9733
5.4%
i 9436
5.2%
n 8716
4.8%
s 7991
4.4%
r 7668
4.3%
l 6011
3.3%
Other values (80) 52313
29.1%
Cyrillic
Value Count Frequency (%)
а 8094
13.5%
э 5321
8.9%
г 3769
6.3%
н 3617
6.0%
й 3246
5.4%
х 3237
5.4%
д 3184
5.3%
л 3035
5.1%
о 2968
4.9%
р 2739
4.6%
Other values (55) 20867
34.7%
Punctuation
Value Count Frequency (%)
102
59.0%
21
12.1%
21
12.1%
13
7.5%
13
7.5%
2
1.2%
1
0.6%
Emoticons
Value Count Frequency (%)
😂 90
28.0%
😅 39
12.1%
😊 33
10.2%
😁 21
6.5%
😉 20
6.2%
😄 13
4.0%
😍 11
3.4%
🙂 10
3.1%
😭 10
3.1%
😎 7
2.2%
Other values (27) 68
21.1%
None
Value Count Frequency (%)
í 89
11.9%
è 80
10.7%
á 70
9.3%
é 61
8.1%
ó 49
6.5%
à 38
5.1%
ú 30
4.0%
ñ 29
3.9%
ø 21
2.8%
ù 21
2.8%
Other values (93) 262
34.9%
VS
Value Count Frequency (%)
51
100.0%
Dingbats
Value Count Frequency (%)
20
55.6%
10
27.8%
4
11.1%
2
5.6%
Misc Symbols
Value Count Frequency (%)
9
32.1%
7
25.0%
6
21.4%
1
3.6%
1
3.6%
1
3.6%
1
3.6%
1
3.6%
1
3.6%
Enclosed Alphanum Sup
Value Count Frequency (%)
🇪 2
11.1%
🇸 2
11.1%
🇮 2
11.1%
🇨 2
11.1%
🇰 1
5.6%
🇦 1
5.6%
🇷 1
5.6%
🇩 1
5.6%
🇹 1
5.6%
🇵 1
5.6%
Other values (4) 4
22.2%
IPA Ext
Value Count Frequency (%)
ə 2
100.0%
Hiragana
Value Count Frequency (%)
1
50.0%
1
50.0%
CJK
Value Count Frequency (%)
1
100.0%

transactions.attributes.answer
Categorical

HIGH CARDINALITY
MISSING
UNIFORM

Answer on a question

Distinct 2702
Distinct (%) 96.4%
Missing 1996
Missing (%) 41.6%
Memory size 37.6 KiB
B
7
Yo
6
No
5
❤️
5
Si
5
Other values (2697)
2776

Length

Max length 1122
Median length 471
Mean length 86.40014265
Min length 1

Characters and Unicode

Total characters 242266
Distinct characters 335
Distinct categories 19 ?
Distinct scripts 6 ?
Distinct blocks 12 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 2639 ?
Unique (%) 94.1%

Sample

1st row Hola! Para mi es la primera vez que participo!
2nd row Primera vez!
3rd row Participo por primera vez
4th row 1) conseguir un grupo de estudio, con la virtualidad es muy facil distraerse y un grupo puede ser presión util. 2)dar las clases en una habitación diferente a la que usas la computadora como recreación de ser posible 3) aprovechar los espacios para preguntar, ya sea en clase o por correo, preguntar fuera de horario con la virtualidad no esta tan mal visto
5th row Una pregunta complicada diria yo, pero a mi me sirve (creo) no dejar para despues nada, hacer todo a tiempo y no todo apurado. Otra cosa tambien importante es, establecer tiempos, decir que de tal a tal hora vas a estudiar, y cumplirlar para despues tener tiempo de descanso 😁👌

Common Values

Value Count Frequency (%)
B 7
0.1%
Yo 6
0.1%
No 5
0.1%
❤️ 5
0.1%
Si 5
0.1%
Ok 4
0.1%
Noche 4
0.1%
Sin piña 4
0.1%
Ingeniería informática 4
0.1%
Yes 4
0.1%
Other values (2692) 2756
57.4%
(Missing) 1996
41.6%

Length

2022-07-04T20:21:46.210942 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
Value Count Frequency (%)
a 696
1.6%
i 657
1.5%
the 626
1.5%
to 507
1.2%
and 477
1.1%
it 350
0.8%
334
0.8%
of 302
0.7%
you 274
0.6%
нь 274
0.6%
Other values (11838) 38010
89.4%

Most occurring characters

Value Count Frequency (%)
39568
16.3%
e 15247
6.3%
a 12439
5.1%
o 11227
4.6%
t 9786
4.0%
i 9466
3.9%
n 8750
3.6%
а 8094
3.3%
s 8041
3.3%
r 7701
3.2%
Other values (325) 111947
46.2%

Most occurring categories

Value Count Frequency (%)
Lowercase Letter 188557
77.8%
Space Separator 39579
16.3%
Uppercase Letter 6187
2.6%
Other Punctuation 4968
2.1%
Decimal Number 972
0.4%
Other Symbol 578
0.2%
Close Punctuation 316
0.1%
Control 248
0.1%
Dash Punctuation 245
0.1%
Open Punctuation 238
0.1%
Other values (9) 378
0.2%

Most frequent character per category

Other Symbol
Value Count Frequency (%)
😂 91
15.7%
😅 39
6.7%
😊 33
5.7%
😁 21
3.6%
😉 20
3.5%
20
3.5%
👌 15
2.6%
😄 13
2.2%
😍 12
2.1%
😭 10
1.7%
Other values (133) 304
52.6%
Lowercase Letter
Value Count Frequency (%)
e 15247
8.1%
a 12439
6.6%
o 11227
6.0%
t 9786
5.2%
i 9466
5.0%
n 8750
4.6%
а 8094
4.3%
s 8041
4.3%
r 7701
4.1%
l 6036
3.2%
Other values (68) 91770
48.7%
Uppercase Letter
Value Count Frequency (%)
I 947
15.3%
T 350
5.7%
S 333
5.4%
A 319
5.2%
N 238
3.8%
C 238
3.8%
P 237
3.8%
M 232
3.7%
B 228
3.7%
D 227
3.7%
Other values (48) 2838
45.9%
Other Punctuation
Value Count Frequency (%)
. 2196
44.2%
, 1548
31.2%
' 331
6.7%
: 312
6.3%
! 182
3.7%
/ 111
2.2%
? 111
2.2%
" 103
2.1%
% 19
0.4%
; 17
0.3%
Other values (6) 38
0.8%
Decimal Number
Value Count Frequency (%)
1 187
19.2%
0 177
18.2%
2 161
16.6%
3 134
13.8%
4 76
7.8%
5 67
6.9%
7 53
5.5%
6 47
4.8%
9 40
4.1%
8 30
3.1%
Math Symbol
Value Count Frequency (%)
+ 12
40.0%
> 7
23.3%
= 7
23.3%
~ 3
10.0%
< 1
3.3%
Final Punctuation
Value Count Frequency (%)
102
82.3%
21
16.9%
» 1
0.8%
Initial Punctuation
Value Count Frequency (%)
21
87.5%
2
8.3%
« 1
4.2%
Modifier Symbol
Value Count Frequency (%)
🏻 16
51.6%
^ 10
32.3%
🏼 5
16.1%
Other Letter
Value Count Frequency (%)
1
33.3%
1
33.3%
1
33.3%
Space Separator
Value Count Frequency (%)
39568
> 99.9%
11
< 0.1%
Close Punctuation
Value Count Frequency (%)
) 280
88.6%
] 36
11.4%
Open Punctuation
Value Count Frequency (%)
( 202
84.9%
[ 36
15.1%
Format
Value Count Frequency (%)
13
92.9%
1
7.1%
Control
Value Count Frequency (%)
248
100.0%
Dash Punctuation
Value Count Frequency (%)
- 245
100.0%
Connector Punctuation
Value Count Frequency (%)
_ 99
100.0%
Nonspacing Mark
Value Count Frequency (%)
51
100.0%
Currency Symbol
Value Count Frequency (%)
£ 2
100.0%

Most occurring scripts

Value Count Frequency (%)
Latin 134667
55.6%
Cyrillic 60077
24.8%
Common 47455
19.6%
Inherited 64
< 0.1%
Hiragana 2
< 0.1%
Han 1
< 0.1%

Most frequent character per script

Common
Value Count Frequency (%)
39568
83.4%
. 2196
4.6%
, 1548
3.3%
' 331
0.7%
: 312
0.7%
) 280
0.6%
248
0.5%
- 245
0.5%
( 202
0.4%
1 187
0.4%
Other values (184) 2338
4.9%
Latin
Value Count Frequency (%)
e 15247
11.3%
a 12439
9.2%
o 11227
8.3%
t 9786
7.3%
i 9466
7.0%
n 8750
6.5%
s 8041
6.0%
r 7701
5.7%
l 6036
4.5%
u 4920
3.7%
Other values (61) 41054
30.5%
Cyrillic
Value Count Frequency (%)
а 8094
13.5%
э 5321
8.9%
г 3769
6.3%
н 3617
6.0%
й 3246
5.4%
х 3237
5.4%
д 3184
5.3%
л 3035
5.1%
о 2968
4.9%
р 2739
4.6%
Other values (55) 20867
34.7%
Inherited
Value Count Frequency (%)
51
79.7%
13
20.3%
Hiragana
Value Count Frequency (%)
1
50.0%
1
50.0%
Han
Value Count Frequency (%)
1
100.0%

Most occurring blocks

Value Count Frequency (%)
ASCII 180800
74.6%
Cyrillic 60077
24.8%
None 754
0.3%
Emoticons 324
0.1%
Punctuation 173
0.1%
VS 51
< 0.1%
Dingbats 36
< 0.1%
Misc Symbols 28
< 0.1%
Enclosed Alphanum Sup 18
< 0.1%
IPA Ext 2
< 0.1%
Other values (2) 3
< 0.1%

Most frequent character per block

ASCII
Value Count Frequency (%)
39568
21.9%
e 15247
8.4%
a 12439
6.9%
o 11227
6.2%
t 9786
5.4%
i 9466
5.2%
n 8750
4.8%
s 8041
4.4%
r 7701
4.3%
l 6036
3.3%
Other values (80) 52539
29.1%
Cyrillic
Value Count Frequency (%)
а 8094
13.5%
э 5321
8.9%
г 3769
6.3%
н 3617
6.0%
й 3246
5.4%
х 3237
5.4%
д 3184
5.3%
л 3035
5.1%
о 2968
4.9%
р 2739
4.6%
Other values (55) 20867
34.7%
Punctuation
Value Count Frequency (%)
102
59.0%
21
12.1%
21
12.1%
13
7.5%
13
7.5%
2
1.2%
1
0.6%
Emoticons
Value Count Frequency (%)
😂 91
28.1%
😅 39
12.0%
😊 33
10.2%
😁 21
6.5%
😉 20
6.2%
😄 13
4.0%
😍 12
3.7%
😭 10
3.1%
🙂 10
3.1%
😎 7
2.2%
Other values (27) 68
21.0%
None
Value Count Frequency (%)
í 89
11.8%
è 80
10.6%
á 70
9.3%
é 63
8.4%
ó 49
6.5%
à 38
5.0%
ú 30
4.0%
ñ 29
3.8%
ù 21
2.8%
ø 21
2.8%
Other values (94) 264
35.0%
VS
Value Count Frequency (%)
51
100.0%
Dingbats
Value Count Frequency (%)
20
55.6%
10
27.8%
4
11.1%
2
5.6%
Misc Symbols
Value Count Frequency (%)
9
32.1%
7
25.0%
6
21.4%
1
3.6%
1
3.6%
1
3.6%
1
3.6%
1
3.6%
1
3.6%
Enclosed Alphanum Sup
Value Count Frequency (%)
🇮 2
11.1%
🇪 2
11.1%
🇸 2
11.1%
🇨 2
11.1%
🇷 1
5.6%
🇦 1
5.6%
🇰 1
5.6%
🇳 1
5.6%
🇵 1
5.6%
🇩 1
5.6%
Other values (4) 4
22.2%
IPA Ext
Value Count Frequency (%)
ə 2
100.0%
Hiragana
Value Count Frequency (%)
1
50.0%
1
50.0%
CJK
Value Count Frequency (%)
1
100.0%

transactions.attributes.anonymous
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING

Anonymous answer

Distinct 2
Distinct (%) 0.1%
Missing 1996
Missing (%) 41.6%
Memory size 37.6 KiB
0.0
2473
1.0
331

Length

Max length 3
Median length 3
Mean length 3
Min length 3

Characters and Unicode

Total characters 8412
Distinct characters 3
Distinct categories 2 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row 0.0
2nd row 0.0
3rd row 0.0
4th row 0.0
5th row 0.0

Common Values

Value Count Frequency (%)
0.0 2473
51.5%
1.0 331
6.9%
(Missing) 1996
41.6%

Length

2022-07-04T20:21:46.476493 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-04T20:21:46.687193 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Value Count Frequency (%)
0.0 2473
88.2%
1.0 331
11.8%

Most occurring characters

Value Count Frequency (%)
0 5277
62.7%
. 2804
33.3%
1 331
3.9%

Most occurring categories

Value Count Frequency (%)
Decimal Number 5608
66.7%
Other Punctuation 2804
33.3%

Most frequent character per category

Decimal Number
Value Count Frequency (%)
0 5277
94.1%
1 331
5.9%
Other Punctuation
Value Count Frequency (%)
. 2804
100.0%

Most occurring scripts

Value Count Frequency (%)
Common 8412
100.0%

Most frequent character per script

Common
Value Count Frequency (%)
0 5277
62.7%
. 2804
33.3%
1 331
3.9%

Most occurring blocks

Value Count Frequency (%)
ASCII 8412
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
0 5277
62.7%
. 2804
33.3%
1 331
3.9%

transactions.attributes.reason
Categorical

HIGH CARDINALITY
MISSING
UNIFORM

Reason of accepting an answer

Distinct 257
Distinct (%) 91.1%
Missing 4518
Missing (%) 94.1%
Memory size 37.6 KiB
spam
5
abusive
4
Бүгдээрээ сайн уу? Миний хувьд амьдралын утга учрыг бодож ойрын хугацаанд их л сэтгэлийн тогтворгүй байдалтай байлаа. Гэхдээ эцэст нь юу ч болсон одоохондоо оюутан байгаа болохоор хичээлээ сурч байгаа цагтаа л сурсан шиг сурах хэрэгтэй гэдгээ бүр илүү ойлгох шиг. Ажлын байраа бодож стрессдэж байна уу?
2
Төлөвлөгөө ойр болон холын гэж ангилвал зүгээр санагддаг. Мэдээж өөрт хэрэгтэй апп уудаар сануулж болох ч аль болох байнга нүдэнд харагдахуйцаар бичиж тэмдэглээд шинэчлээд явбал зүгээр байдаг. 😊 Төлөвлөх бол маш чухал 😊
2
♥️
2
Other values (252)
267

Length

Max length 303
Median length 94.5
Mean length 33.31914894
Min length 1

Characters and Unicode

Total characters 9396
Distinct characters 133
Distinct categories 12 ?
Distinct scripts 4 ?
Distinct blocks 7 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 237 ?
Unique (%) 84.0%

Sample

1st row No quería seguir preguntando
2nd row New insights
3rd row Porque responde a lo que quiero
4th row Fast answer
5th row Respuesta no depende de valores

Common Values

Value Count Frequency (%)
spam 5
0.1%
abusive 4
0.1%
Бүгдээрээ сайн уу? Миний хувьд амьдралын утга учрыг бодож ойрын хугацаанд их л сэтгэлийн тогтворгүй байдалтай байлаа. Гэхдээ эцэст нь юу ч болсон одоохондоо оюутан байгаа болохоор хичээлээ сурч байгаа цагтаа л сурсан шиг сурах хэрэгтэй гэдгээ бүр илүү ойлгох шиг. Ажлын байраа бодож стрессдэж байна уу? 2
< 0.1%
Төлөвлөгөө ойр болон холын гэж ангилвал зүгээр санагддаг. Мэдээж өөрт хэрэгтэй апп уудаар сануулж болох ч аль болох байнга нүдэнд харагдахуйцаар бичиж тэмдэглээд шинэчлээд явбал зүгээр байдаг. 😊 Төлөвлөх бол маш чухал 😊 2
< 0.1%
♥️ 2
< 0.1%
It is feasible, realistic and workable 2
< 0.1%
The answer was helpful 2
< 0.1%
... 2
< 0.1%
Thorough 2
< 0.1%
Interesting 2
< 0.1%
Other values (247) 257
5.4%
(Missing) 4518
94.1%

Length

2022-07-04T20:21:46.938557 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
Value Count Frequency (%)
it 52
3.2%
was 38
2.3%
the 36
2.2%
answer 36
2.2%
i 34
2.1%
and 31
1.9%
a 27
1.7%
la 19
1.2%
good 14
0.9%
is 14
0.9%
Other values (761) 1334
81.6%

Most occurring characters

Value Count Frequency (%)
1354
14.4%
e 728
7.7%
a 557
5.9%
s 456
4.9%
o 432
4.6%
t 422
4.5%
i 391
4.2%
n 380
4.0%
r 331
3.5%
а 272
2.9%
Other values (123) 4073
43.3%

Most occurring categories

Value Count Frequency (%)
Lowercase Letter 7608
81.0%
Space Separator 1354
14.4%
Uppercase Letter 282
3.0%
Other Punctuation 106
1.1%
Other Symbol 18
0.2%
Decimal Number 8
0.1%
Dash Punctuation 5
0.1%
Nonspacing Mark 4
< 0.1%
Close Punctuation 4
< 0.1%
Control 3
< 0.1%
Other values (2) 4
< 0.1%

Most frequent character per category

Lowercase Letter
Value Count Frequency (%)
e 728
9.6%
a 557
7.3%
s 456
6.0%
o 432
5.7%
t 422
5.5%
i 391
5.1%
n 380
5.0%
r 331
4.4%
а 272
3.6%
l 221
2.9%
Other values (56) 3418
44.9%
Uppercase Letter
Value Count Frequency (%)
I 68
24.1%
P 26
9.2%
N 19
6.7%
S 16
5.7%
T 11
3.9%
M 10
3.5%
Х 9
3.2%
G 8
2.8%
B 8
2.8%
R 8
2.8%
Other values (30) 99
35.1%
Other Symbol
Value Count Frequency (%)
😊 5
27.8%
😢 2
11.1%
😘 2
11.1%
2
11.1%
2
11.1%
😭 1
5.6%
1
5.6%
👍 1
5.6%
😍 1
5.6%
😁 1
5.6%
Other Punctuation
Value Count Frequency (%)
. 64
60.4%
, 20
18.9%
' 12
11.3%
? 5
4.7%
! 3
2.8%
: 2
1.9%
Decimal Number
Value Count Frequency (%)
0 4
50.0%
3 2
25.0%
1 1
12.5%
2 1
12.5%
Space Separator
Value Count Frequency (%)
1354
100.0%
Dash Punctuation
Value Count Frequency (%)
- 5
100.0%
Nonspacing Mark
Value Count Frequency (%)
4
100.0%
Close Punctuation
Value Count Frequency (%)
) 4
100.0%
Control
Value Count Frequency (%)
3
100.0%
Modifier Symbol
Value Count Frequency (%)
^ 2
100.0%
Open Punctuation
Value Count Frequency (%)
( 2
100.0%

Most occurring scripts

Value Count Frequency (%)
Latin 5751
61.2%
Cyrillic 2139
22.8%
Common 1502
16.0%
Inherited 4
< 0.1%

Most frequent character per script

Latin
Value Count Frequency (%)
e 728
12.7%
a 557
9.7%
s 456
7.9%
o 432
7.5%
t 422
7.3%
i 391
6.8%
n 380
6.6%
r 331
5.8%
l 221
3.8%
u 187
3.3%
Other values (49) 1646
28.6%
Cyrillic
Value Count Frequency (%)
а 272
12.7%
э 169
7.9%
о 137
6.4%
н 135
6.3%
л 129
6.0%
г 125
5.8%
й 116
5.4%
х 108
5.0%
д 107
5.0%
р 96
4.5%
Other values (37) 745
34.8%
Common
Value Count Frequency (%)
1354
90.1%
. 64
4.3%
, 20
1.3%
' 12
0.8%
- 5
0.3%
? 5
0.3%
😊 5
0.3%
0 4
0.3%
) 4
0.3%
3
0.2%
Other values (16) 26
1.7%
Inherited
Value Count Frequency (%)
4
100.0%

Most occurring blocks

Value Count Frequency (%)
ASCII 7196
76.6%
Cyrillic 2139
22.8%
None 40
0.4%
Emoticons 12
0.1%
VS 4
< 0.1%
Dingbats 3
< 0.1%
Misc Symbols 2
< 0.1%

Most frequent character per block

ASCII
Value Count Frequency (%)
1354
18.8%
e 728
10.1%
a 557
7.7%
s 456
6.3%
o 432
6.0%
t 422
5.9%
i 391
5.4%
n 380
5.3%
r 331
4.6%
l 221
3.1%
Other values (55) 1924
26.7%
Cyrillic
Value Count Frequency (%)
а 272
12.7%
э 169
7.9%
о 137
6.4%
н 135
6.3%
л 129
6.0%
г 125
5.8%
й 116
5.4%
х 108
5.0%
д 107
5.0%
р 96
4.5%
Other values (37) 745
34.8%
None
Value Count Frequency (%)
é 10
25.0%
í 10
25.0%
ó 4
10.0%
è 4
10.0%
È 3
7.5%
Ú 3
7.5%
ù 2
5.0%
á 1
2.5%
ø 1
2.5%
ò 1
2.5%
Emoticons
Value Count Frequency (%)
😊 5
41.7%
😢 2
16.7%
😘 2
16.7%
😭 1
8.3%
😍 1
8.3%
😁 1
8.3%
VS
Value Count Frequency (%)
4
100.0%
Dingbats
Value Count Frequency (%)
2
66.7%
1
33.3%
Misc Symbols
Value Count Frequency (%)
2
100.0%

transactions.attributes.transactionId
Real number (ℝ ≥0 )

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING

Id of transaction

Distinct 14
Distinct (%) 5.1%
Missing 4524
Missing (%) 94.2%
Infinite 0
Infinite (%) 0.0%
Mean 3.673913043
Minimum 1
Maximum 14
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 37.6 KiB
2022-07-04T20:21:47.189679 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum 1
5-th percentile 1
Q1 1
median 3
Q3 5
95-th percentile 9
Maximum 14
Range 13
Interquartile range (IQR) 4

Descriptive statistics

Standard deviation 2.922422516
Coefficient of variation (CV) 0.7954522824
Kurtosis 1.699966041
Mean 3.673913043
Median Absolute Deviation (MAD) 2
Skewness 1.356492791
Sum 1014
Variance 8.54055336
Monotonicity Not monotonic
2022-07-04T20:21:47.403821 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=14)
Value Count Frequency (%)
1 85
1.8%
2 39
0.8%
3 35
0.7%
4 30
0.6%
5 30
0.6%
6 15
0.3%
8 11
0.2%
7 10
0.2%
9 9
0.2%
13 3
0.1%
Other values (4) 9
0.2%
(Missing) 4524
94.2%
Value Count Frequency (%)
1 85
1.8%
2 39
0.8%
3 35
0.7%
4 30
0.6%
5 30
0.6%
6 15
0.3%
7 10
0.2%
8 11
0.2%
9 9
0.2%
10 2
< 0.1%
Value Count Frequency (%)
14 3
0.1%
13 3
0.1%
12 2
< 0.1%
11 2
< 0.1%
10 2
< 0.1%
9 9
0.2%
8 11
0.2%
7 10
0.2%
6 15
0.3%
5 30
0.6%

transactions.attributes.helpful
Categorical

MISSING

How helpful was the accepted answer on transaction

Distinct 5
Distinct (%) 1.8%
Missing 4527
Missing (%) 94.3%
Memory size 37.6 KiB
veryHelpful
112
extremelyHelpful
97
slightlyHelpful
37
somewhatHelpful
18
notAtAllHelpful
9

Length

Max length 16
Median length 15
Mean length 13.71428571
Min length 11

Characters and Unicode

Total characters 3744
Distinct characters 21
Distinct categories 2 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row extremelyHelpful
2nd row extremelyHelpful
3rd row veryHelpful
4th row extremelyHelpful
5th row extremelyHelpful

Common Values

Value Count Frequency (%)
veryHelpful 112
2.3%
extremelyHelpful 97
2.0%
slightlyHelpful 37
0.8%
somewhatHelpful 18
0.4%
notAtAllHelpful 9
0.2%
(Missing) 4527
94.3%

Length

2022-07-04T20:21:47.643487 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-04T20:21:47.890414 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Value Count Frequency (%)
veryhelpful 112
41.0%
extremelyhelpful 97
35.5%
slightlyhelpful 37
13.6%
somewhathelpful 18
6.6%
notatallhelpful 9
3.3%

Most occurring characters

Value Count Frequency (%)
l 735
19.6%
e 694
18.5%
H 273
7.3%
p 273
7.3%
f 273
7.3%
u 273
7.3%
y 246
6.6%
r 209
5.6%
t 170
4.5%
m 115
3.1%
Other values (11) 483
12.9%

Most occurring categories

Value Count Frequency (%)
Lowercase Letter 3453
92.2%
Uppercase Letter 291
7.8%

Most frequent character per category

Lowercase Letter
Value Count Frequency (%)
l 735
21.3%
e 694
20.1%
p 273
7.9%
f 273
7.9%
u 273
7.9%
y 246
7.1%
r 209
6.1%
t 170
4.9%
m 115
3.3%
v 112
3.2%
Other values (9) 353
10.2%
Uppercase Letter
Value Count Frequency (%)
H 273
93.8%
A 18
6.2%

Most occurring scripts

Value Count Frequency (%)
Latin 3744
100.0%

Most frequent character per script

Latin
Value Count Frequency (%)
l 735
19.6%
e 694
18.5%
H 273
7.3%
p 273
7.3%
f 273
7.3%
u 273
7.3%
y 246
6.6%
r 209
5.6%
t 170
4.5%
m 115
3.1%
Other values (11) 483
12.9%

Most occurring blocks

Value Count Frequency (%)
ASCII 3744
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
l 735
19.6%
e 694
18.5%
H 273
7.3%
p 273
7.3%
f 273
7.3%
u 273
7.3%
y 246
6.6%
r 209
5.6%
t 170
4.5%
m 115
3.1%
Other values (11) 483
12.9%

goal.name
Categorical

HIGH CARDINALITY
MISSING
UNIFORM

Question (without duplicating extended questions)

Distinct 688
Distinct (%) 94.9%
Missing 4075
Missing (%) 84.9%
Memory size 37.6 KiB
Та хэд энэ 3р тунгийн талаар ямар бодолтой байна.
7
Cant stop, wont stop ___ ?
4
Lomito árabe o hamburguesa?
4
Do you feel COVID has affected 2021-2022 academic year? How?
3
Cheap eats in Copenhagen? 🤓
3
Other values (683)
704

Length

Max length 1392
Median length 200
Mean length 72.77655172
Min length 3

Characters and Unicode

Total characters 52763
Distinct characters 218
Distinct categories 16 ?
Distinct scripts 4 ?
Distinct blocks 10 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 663 ?
Unique (%) 91.4%

Sample

1st row Hola a todos, quería saber quienes participan en el chat box por primera vez y quienes somos los que disfrutamos del primer experimento y quisimos volver
2nd row 3 principios para estudiar efectivamente con clases virtuales?
3rd row Hola?
4th row Las preguntas y respuestas vienen con una firma: -Giovanni por ejemplo. Eso se agrega de forma automatica? No agregue firma a este mensaje
5th row Cuál es la capital de Afganistán?

Common Values

Value Count Frequency (%)
Та хэд энэ 3р тунгийн талаар ямар бодолтой байна. 7
0.1%
Cant stop, wont stop ___ ? 4
0.1%
Lomito árabe o hamburguesa? 4
0.1%
Do you feel COVID has affected 2021-2022 academic year? How? 3
0.1%
Cheap eats in Copenhagen? 🤓 3
0.1%
Mac or Windows - why? 3
0.1%
Any recommendation with books? 2
< 0.1%
Tecnica para pasar info 1, es urgente 😂😂 2
< 0.1%
algún consejo para estudiar una materia 'leída'? 2
< 0.1%
yes 2
< 0.1%
Other values (678) 693
14.4%
(Missing) 4075
84.9%

Length

2022-07-04T20:21:48.191844 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
Value Count Frequency (%)
you 163
1.8%
the 108
1.2%
a 99
1.1%
in 94
1.1%
82
0.9%
байна 79
0.9%
is 78
0.9%
do 73
0.8%
what 71
0.8%
to 71
0.8%
Other values (3610) 7967
89.7%

Most occurring characters

Value Count Frequency (%)
8142
15.4%
e 3134
5.9%
a 3088
5.9%
o 2539
4.8%
i 2084
3.9%
n 1973
3.7%
t 1960
3.7%
а 1767
3.3%
r 1755
3.3%
s 1654
3.1%
Other values (208) 24667
46.8%

Most occurring categories

Value Count Frequency (%)
Lowercase Letter 41089
77.9%
Space Separator 8142
15.4%
Other Punctuation 1430
2.7%
Uppercase Letter 1399
2.7%
Connector Punctuation 200
0.4%
Decimal Number 129
0.2%
Other Symbol 114
0.2%
Control 74
0.1%
Close Punctuation 65
0.1%
Open Punctuation 48
0.1%
Other values (6) 73
0.1%

Most frequent character per category

Lowercase Letter
Value Count Frequency (%)
e 3134
7.6%
a 3088
7.5%
o 2539
6.2%
i 2084
5.1%
n 1973
4.8%
t 1960
4.8%
а 1767
4.3%
r 1755
4.3%
s 1654
4.0%
u 1468
3.6%
Other values (61) 19667
47.9%
Uppercase Letter
Value Count Frequency (%)
C 124
8.9%
W 122
8.7%
A 96
6.9%
D 79
5.6%
I 78
5.6%
H 69
4.9%
Q 64
4.6%
P 53
3.8%
S 46
3.3%
E 40
2.9%
Other values (48) 628
44.9%
Other Symbol
Value Count Frequency (%)
😂 23
20.2%
😊 9
7.9%
👀 9
7.9%
🤣 5
4.4%
🤗 4
3.5%
🙏 4
3.5%
💪 3
2.6%
🤓 3
2.6%
🥺 3
2.6%
🔥 3
2.6%
Other values (37) 48
42.1%
Other Punctuation
Value Count Frequency (%)
? 706
49.4%
, 201
14.1%
. 199
13.9%
: 143
10.0%
¿ 50
3.5%
' 50
3.5%
/ 33
2.3%
! 21
1.5%
" 14
1.0%
; 5
0.3%
Other values (6) 8
0.6%
Decimal Number
Value Count Frequency (%)
1 40
31.0%
2 31
24.0%
0 22
17.1%
3 17
13.2%
4 5
3.9%
5 4
3.1%
8 4
3.1%
7 4
3.1%
9 1
0.8%
6 1
0.8%
Close Punctuation
Value Count Frequency (%)
) 63
96.9%
] 2
3.1%
Open Punctuation
Value Count Frequency (%)
( 46
95.8%
[ 2
4.2%
Final Punctuation
Value Count Frequency (%)
6
50.0%
6
50.0%
Math Symbol
Value Count Frequency (%)
~ 4
57.1%
> 3
42.9%
Modifier Symbol
Value Count Frequency (%)
🏻 3
75.0%
^ 1
25.0%
Space Separator
Value Count Frequency (%)
8142
100.0%
Connector Punctuation
Value Count Frequency (%)
_ 200
100.0%
Control
Value Count Frequency (%)
74
100.0%
Dash Punctuation
Value Count Frequency (%)
- 42
100.0%
Initial Punctuation
Value Count Frequency (%)
5
100.0%
Nonspacing Mark
Value Count Frequency (%)
3
100.0%

Most occurring scripts

Value Count Frequency (%)
Latin 30121
57.1%
Cyrillic 12367
23.4%
Common 10272
19.5%
Inherited 3
< 0.1%

Most frequent character per script

Common
Value Count Frequency (%)
8142
79.3%
? 706
6.9%
, 201
2.0%
_ 200
1.9%
. 199
1.9%
: 143
1.4%
74
0.7%
) 63
0.6%
¿ 50
0.5%
' 50
0.5%
Other values (78) 444
4.3%
Latin
Value Count Frequency (%)
e 3134
10.4%
a 3088
10.3%
o 2539
8.4%
i 2084
6.9%
n 1973
6.6%
t 1960
6.5%
r 1755
5.8%
s 1654
5.5%
u 1468
4.9%
h 1248
4.1%
Other values (56) 9218
30.6%
Cyrillic
Value Count Frequency (%)
а 1767
14.3%
э 973
7.9%
н 821
6.6%
й 667
5.4%
г 651
5.3%
х 618
5.0%
р 610
4.9%
л 603
4.9%
д 595
4.8%
о 544
4.4%
Other values (53) 4518
36.5%
Inherited
Value Count Frequency (%)
3
100.0%

Most occurring blocks

Value Count Frequency (%)
ASCII 39994
75.8%
Cyrillic 12367
23.4%
None 310
0.6%
Emoticons 64
0.1%
Punctuation 17
< 0.1%
Dingbats 4
< 0.1%
VS 3
< 0.1%
Enclosed Alphanum Sup 2
< 0.1%
Misc Symbols 1
< 0.1%
IPA Ext 1
< 0.1%

Most frequent character per block

ASCII
Value Count Frequency (%)
8142
20.4%
e 3134
7.8%
a 3088
7.7%
o 2539
6.3%
i 2084
5.2%
n 1973
4.9%
t 1960
4.9%
r 1755
4.4%
s 1654
4.1%
u 1468
3.7%
Other values (76) 12197
30.5%
Cyrillic
Value Count Frequency (%)
а 1767
14.3%
э 973
7.9%
н 821
6.6%
й 667
5.4%
г 651
5.3%
х 618
5.0%
р 610
4.9%
л 603
4.9%
д 595
4.8%
о 544
4.4%
Other values (53) 4518
36.5%
None
Value Count Frequency (%)
¿ 50
16.1%
é 50
16.1%
á 40
12.9%
í 23
7.4%
è 23
7.4%
ó 18
5.8%
ñ 17
5.5%
ú 10
3.2%
👀 9
2.9%
ù 9
2.9%
Other values (26) 61
19.7%
Emoticons
Value Count Frequency (%)
😂 23
35.9%
😊 9
14.1%
🙏 4
6.2%
😪 3
4.7%
🙈 2
3.1%
😉 2
3.1%
😇 2
3.1%
😥 2
3.1%
🙄 2
3.1%
😭 2
3.1%
Other values (12) 13
20.3%
Punctuation
Value Count Frequency (%)
6
35.3%
6
35.3%
5
29.4%
VS
Value Count Frequency (%)
3
100.0%
Dingbats
Value Count Frequency (%)
2
50.0%
1
25.0%
1
25.0%
Enclosed Alphanum Sup
Value Count Frequency (%)
🇩 1
50.0%
🇰 1
50.0%
Misc Symbols
Value Count Frequency (%)
1
100.0%
IPA Ext
Value Count Frequency (%)
ə 1
100.0%

goal.description
Unsupported

MISSING
REJECTED
UNSUPPORTED

Empty column

Missing 4800
Missing (%) 100.0%
Memory size 37.6 KiB

attributes.domain
Categorical

MISSING

Question’s domain

Distinct 11
Distinct (%) 1.5%
Missing 4075
Missing (%) 84.9%
Memory size 37.6 KiB
varia_misc
255
life_ponders
133
studying_career
115
local_university
45
local_things
43
Other values (6)
134

Length

Max length 18
Median length 17
Mean length 12.26068966
Min length 5

Characters and Unicode

Total characters 8889
Distinct characters 21
Distinct categories 2 ?
Distinct scripts 2 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row varia_misc
2nd row studying_career
3rd row varia_misc
4th row varia_misc
5th row local_things

Common Values

Value Count Frequency (%)
varia_misc 255
5.3%
life_ponders 133
2.8%
studying_career 115
2.4%
local_university 45
0.9%
local_things 43
0.9%
food_and_cooking 38
0.8%
cinema_theatre 36
0.8%
music 30
0.6%
cultural_interests 16
0.3%
physical_activity 11
0.2%
(Missing) 4075
84.9%

Length

2022-07-04T20:21:48.490055 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
Value Count Frequency (%)
varia_misc 255
35.2%
life_ponders 133
18.3%
studying_career 115
15.9%
local_university 45
6.2%
local_things 43
5.9%
food_and_cooking 38
5.2%
cinema_theatre 36
5.0%
music 30
4.1%
cultural_interests 16
2.2%
physical_activity 11
1.5%

Most occurring characters

Value Count Frequency (%)
i 1044
11.7%
a 870
9.8%
r 737
8.3%
_ 736
8.3%
e 681
7.7%
s 670
7.5%
c 603
6.8%
n 467
5.3%
o 373
4.2%
l 352
4.0%
Other values (11) 2356
26.5%

Most occurring categories

Value Count Frequency (%)
Lowercase Letter 8153
91.7%
Connector Punctuation 736
8.3%

Most frequent character per category

Lowercase Letter
Value Count Frequency (%)
i 1044
12.8%
a 870
10.7%
r 737
9.0%
e 681
8.4%
s 670
8.2%
c 603
7.4%
n 467
5.7%
o 373
4.6%
l 352
4.3%
t 351
4.3%
Other values (10) 2005
24.6%
Connector Punctuation
Value Count Frequency (%)
_ 736
100.0%

Most occurring scripts

Value Count Frequency (%)
Latin 8153
91.7%
Common 736
8.3%

Most frequent character per script

Latin
Value Count Frequency (%)
i 1044
12.8%
a 870
10.7%
r 737
9.0%
e 681
8.4%
s 670
8.2%
c 603
7.4%
n 467
5.7%
o 373
4.6%
l 352
4.3%
t 351
4.3%
Other values (10) 2005
24.6%
Common
Value Count Frequency (%)
_ 736
100.0%

Most occurring blocks

Value Count Frequency (%)
ASCII 8889
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
i 1044
11.7%
a 870
9.8%
r 737
8.3%
_ 736
8.3%
e 681
7.7%
s 670
7.5%
c 603
6.8%
n 467
5.3%
o 373
4.2%
l 352
4.0%
Other values (11) 2356
26.5%

attributes.domainInterest
Categorical

HIGH CORRELATION
MISSING

Similar-different domain

Distinct 3
Distinct (%) 0.4%
Missing 4075
Missing (%) 84.9%
Memory size 37.6 KiB
indifferent
402
similar
212
different
111

Length

Max length 11
Median length 11
Mean length 9.524137931
Min length 7

Characters and Unicode

Total characters 6905
Distinct characters 11
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row indifferent
2nd row indifferent
3rd row indifferent
4th row indifferent
5th row similar

Common Values

Value Count Frequency (%)
indifferent 402
8.4%
similar 212
4.4%
different 111
2.3%
(Missing) 4075
84.9%

Length

2022-07-04T20:21:48.735800 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-04T20:21:49.001236 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Value Count Frequency (%)
indifferent 402
55.4%
similar 212
29.2%
different 111
15.3%

Most occurring characters

Value Count Frequency (%)
i 1339
19.4%
f 1026
14.9%
e 1026
14.9%
n 915
13.3%
r 725
10.5%
d 513
7.4%
t 513
7.4%
s 212
3.1%
m 212
3.1%
l 212
3.1%

Most occurring categories

Value Count Frequency (%)
Lowercase Letter 6905
100.0%

Most frequent character per category

Lowercase Letter
Value Count Frequency (%)
i 1339
19.4%
f 1026
14.9%
e 1026
14.9%
n 915
13.3%
r 725
10.5%
d 513
7.4%
t 513
7.4%
s 212
3.1%
m 212
3.1%
l 212
3.1%

Most occurring scripts

Value Count Frequency (%)
Latin 6905
100.0%

Most frequent character per script

Latin
Value Count Frequency (%)
i 1339
19.4%
f 1026
14.9%
e 1026
14.9%
n 915
13.3%
r 725
10.5%
d 513
7.4%
t 513
7.4%
s 212
3.1%
m 212
3.1%
l 212
3.1%

Most occurring blocks

Value Count Frequency (%)
ASCII 6905
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
i 1339
19.4%
f 1026
14.9%
e 1026
14.9%
n 915
13.3%
r 725
10.5%
d 513
7.4%
t 513
7.4%
s 212
3.1%
m 212
3.1%
l 212
3.1%

attributes.beliefsAndValues
Categorical

HIGH CORRELATION
MISSING

Similar-different beliefs and values

Distinct 3
Distinct (%) 0.4%
Missing 4075
Missing (%) 84.9%
Memory size 37.6 KiB
indifferent
459
similar
151
different
115

Length

Max length 11
Median length 11
Mean length 9.849655172
Min length 7

Characters and Unicode

Total characters 7141
Distinct characters 11
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row indifferent
2nd row similar
3rd row different
4th row indifferent
5th row similar

Common Values

Value Count Frequency (%)
indifferent 459
9.6%
similar 151
3.1%
different 115
2.4%
(Missing) 4075
84.9%

Length

2022-07-04T20:21:49.224265 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-04T20:21:49.470506 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Value Count Frequency (%)
indifferent 459
63.3%
similar 151
20.8%
different 115
15.9%

Most occurring characters

Value Count Frequency (%)
i 1335
18.7%
f 1148
16.1%
e 1148
16.1%
n 1033
14.5%
r 725
10.2%
d 574
8.0%
t 574
8.0%
s 151
2.1%
m 151
2.1%
l 151
2.1%

Most occurring categories

Value Count Frequency (%)
Lowercase Letter 7141
100.0%

Most frequent character per category

Lowercase Letter
Value Count Frequency (%)
i 1335
18.7%
f 1148
16.1%
e 1148
16.1%
n 1033
14.5%
r 725
10.2%
d 574
8.0%
t 574
8.0%
s 151
2.1%
m 151
2.1%
l 151
2.1%

Most occurring scripts

Value Count Frequency (%)
Latin 7141
100.0%

Most frequent character per script

Latin
Value Count Frequency (%)
i 1335
18.7%
f 1148
16.1%
e 1148
16.1%
n 1033
14.5%
r 725
10.2%
d 574
8.0%
t 574
8.0%
s 151
2.1%
m 151
2.1%
l 151
2.1%

Most occurring blocks

Value Count Frequency (%)
ASCII 7141
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
i 1335
18.7%
f 1148
16.1%
e 1148
16.1%
n 1033
14.5%
r 725
10.2%
d 574
8.0%
t 574
8.0%
s 151
2.1%
m 151
2.1%
l 151
2.1%

attributes.sensitive
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING

Is it a sensitive question

Distinct 2
Distinct (%) 0.3%
Missing 4075
Missing (%) 84.9%
Memory size 37.6 KiB
0.0
587
1.0
138

Length

Max length 3
Median length 3
Mean length 3
Min length 3

Characters and Unicode

Total characters 2175
Distinct characters 3
Distinct categories 2 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row 0.0
2nd row 0.0
3rd row 0.0
4th row 0.0
5th row 0.0

Common Values

Value Count Frequency (%)
0.0 587
12.2%
1.0 138
2.9%
(Missing) 4075
84.9%

Length

2022-07-04T20:21:49.679776 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-04T20:21:49.900027 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Value Count Frequency (%)
0.0 587
81.0%
1.0 138
19.0%

Most occurring characters

Value Count Frequency (%)
0 1312
60.3%
. 725
33.3%
1 138
6.3%

Most occurring categories

Value Count Frequency (%)
Decimal Number 1450
66.7%
Other Punctuation 725
33.3%

Most frequent character per category

Decimal Number
Value Count Frequency (%)
0 1312
90.5%
1 138
9.5%
Other Punctuation
Value Count Frequency (%)
. 725
100.0%

Most occurring scripts

Value Count Frequency (%)
Common 2175
100.0%

Most frequent character per script

Common
Value Count Frequency (%)
0 1312
60.3%
. 725
33.3%
1 138
6.3%

Most occurring blocks

Value Count Frequency (%)
ASCII 2175
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
0 1312
60.3%
. 725
33.3%
1 138
6.3%

attributes.anonymous
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING

Is it an anonymous question

Distinct 2
Distinct (%) 0.3%
Missing 4075
Missing (%) 84.9%
Memory size 37.6 KiB
0.0
601
1.0
124

Length

Max length 3
Median length 3
Mean length 3
Min length 3

Characters and Unicode

Total characters 2175
Distinct characters 3
Distinct categories 2 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row 0.0
2nd row 0.0
3rd row 0.0
4th row 0.0
5th row 0.0

Common Values

Value Count Frequency (%)
0.0 601
12.5%
1.0 124
2.6%
(Missing) 4075
84.9%

Length

2022-07-04T20:21:50.089909 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-04T20:21:50.312674 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Value Count Frequency (%)
0.0 601
82.9%
1.0 124
17.1%

Most occurring characters

Value Count Frequency (%)
0 1326
61.0%
. 725
33.3%
1 124
5.7%

Most occurring categories

Value Count Frequency (%)
Decimal Number 1450
66.7%
Other Punctuation 725
33.3%

Most frequent character per category

Decimal Number
Value Count Frequency (%)
0 1326
91.4%
1 124
8.6%
Other Punctuation
Value Count Frequency (%)
. 725
100.0%

Most occurring scripts

Value Count Frequency (%)
Common 2175
100.0%

Most frequent character per script

Common
Value Count Frequency (%)
0 1326
61.0%
. 725
33.3%
1 124
5.7%

Most occurring blocks

Value Count Frequency (%)
ASCII 2175
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
0 1326
61.0%
. 725
33.3%
1 124
5.7%

attributes.socialCloseness
Categorical

HIGH CORRELATION
MISSING

Close-far social closeness

Distinct 3
Distinct (%) 0.4%
Missing 4075
Missing (%) 84.9%
Memory size 37.6 KiB
indifferent
534
similar
100
different
91

Length

Max length 11
Median length 11
Mean length 10.19724138
Min length 7

Characters and Unicode

Total characters 7393
Distinct characters 11
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row indifferent
2nd row indifferent
3rd row indifferent
4th row indifferent
5th row indifferent

Common Values

Value Count Frequency (%)
indifferent 534
11.1%
similar 100
2.1%
different 91
1.9%
(Missing) 4075
84.9%

Length

2022-07-04T20:21:50.519174 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-04T20:21:50.766597 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Value Count Frequency (%)
indifferent 534
73.7%
similar 100
13.8%
different 91
12.6%

Most occurring characters

Value Count Frequency (%)
i 1359
18.4%
f 1250
16.9%
e 1250
16.9%
n 1159
15.7%
r 725
9.8%
d 625
8.5%
t 625
8.5%
s 100
1.4%
m 100
1.4%
l 100
1.4%

Most occurring categories

Value Count Frequency (%)
Lowercase Letter 7393
100.0%

Most frequent character per category

Lowercase Letter
Value Count Frequency (%)
i 1359
18.4%
f 1250
16.9%
e 1250
16.9%
n 1159
15.7%
r 725
9.8%
d 625
8.5%
t 625
8.5%
s 100
1.4%
m 100
1.4%
l 100
1.4%

Most occurring scripts

Value Count Frequency (%)
Latin 7393
100.0%

Most frequent character per script

Latin
Value Count Frequency (%)
i 1359
18.4%
f 1250
16.9%
e 1250
16.9%
n 1159
15.7%
r 725
9.8%
d 625
8.5%
t 625
8.5%
s 100
1.4%
m 100
1.4%
l 100
1.4%

Most occurring blocks

Value Count Frequency (%)
ASCII 7393
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
i 1359
18.4%
f 1250
16.9%
e 1250
16.9%
n 1159
15.7%
r 725
9.8%
d 625
8.5%
t 625
8.5%
s 100
1.4%
m 100
1.4%
l 100
1.4%

attributes.positionOfAnswerer
Categorical

HIGH CORRELATION
MISSING

Physical proximity of answerer

Distinct 2
Distinct (%) 0.3%
Missing 4075
Missing (%) 84.9%
Memory size 37.6 KiB
anywhere
610
nearby
115

Length

Max length 8
Median length 8
Mean length 7.682758621
Min length 6

Characters and Unicode

Total characters 5570
Distinct characters 8
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row anywhere
2nd row anywhere
3rd row anywhere
4th row anywhere
5th row anywhere

Common Values

Value Count Frequency (%)
anywhere 610
12.7%
nearby 115
2.4%
(Missing) 4075
84.9%

Length

2022-07-04T20:21:50.998758 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-04T20:21:51.269806 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Value Count Frequency (%)
anywhere 610
84.1%
nearby 115
15.9%

Most occurring characters

Value Count Frequency (%)
e 1335
24.0%
a 725
13.0%
n 725
13.0%
y 725
13.0%
r 725
13.0%
w 610
11.0%
h 610
11.0%
b 115
2.1%

Most occurring categories

Value Count Frequency (%)
Lowercase Letter 5570
100.0%

Most frequent character per category

Lowercase Letter
Value Count Frequency (%)
e 1335
24.0%
a 725
13.0%
n 725
13.0%
y 725
13.0%
r 725
13.0%
w 610
11.0%
h 610
11.0%
b 115
2.1%

Most occurring scripts

Value Count Frequency (%)
Latin 5570
100.0%

Most frequent character per script

Latin
Value Count Frequency (%)
e 1335
24.0%
a 725
13.0%
n 725
13.0%
y 725
13.0%
r 725
13.0%
w 610
11.0%
h 610
11.0%
b 115
2.1%

Most occurring blocks

Value Count Frequency (%)
ASCII 5570
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
e 1335
24.0%
a 725
13.0%
n 725
13.0%
y 725
13.0%
r 725
13.0%
w 610
11.0%
h 610
11.0%
b 115
2.1%

attributes.maxUsers
Categorical

CONSTANT
MISSING
REJECTED

Number of users the question is forwarded to

Distinct 1
Distinct (%) 0.1%
Missing 4075
Missing (%) 84.9%
Memory size 37.6 KiB
15.0
725

Length

Max length 4
Median length 4
Mean length 4
Min length 4

Characters and Unicode

Total characters 2900
Distinct characters 4
Distinct categories 2 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row 15.0
2nd row 15.0
3rd row 15.0
4th row 15.0
5th row 15.0

Common Values

Value Count Frequency (%)
15.0 725
15.1%
(Missing) 4075
84.9%

Length

2022-07-04T20:21:51.682556 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-04T20:21:51.898953 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Value Count Frequency (%)
15.0 725
100.0%

Most occurring characters

Value Count Frequency (%)
1 725
25.0%
5 725
25.0%
. 725
25.0%
0 725
25.0%

Most occurring categories

Value Count Frequency (%)
Decimal Number 2175
75.0%
Other Punctuation 725
25.0%

Most frequent character per category

Decimal Number
Value Count Frequency (%)
1 725
33.3%
5 725
33.3%
0 725
33.3%
Other Punctuation
Value Count Frequency (%)
. 725
100.0%

Most occurring scripts

Value Count Frequency (%)
Common 2900
100.0%

Most frequent character per script

Common
Value Count Frequency (%)
1 725
25.0%
5 725
25.0%
. 725
25.0%
0 725
25.0%

Most occurring blocks

Value Count Frequency (%)
ASCII 2900
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
1 725
25.0%
5 725
25.0%
. 725
25.0%
0 725
25.0%

Interactions

2022-07-04T20:21:25.857542 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:21:18.551213 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:21:19.902205 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:21:21.352898 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:21:22.789150 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:21:24.458154 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:21:26.054677 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:21:18.810445 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:21:20.136774 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:21:21.595919 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:21:23.040285 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:21:24.655104 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:21:26.273441 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:21:19.045780 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:21:20.380715 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:21:21.830803 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:21:23.283819 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:21:24.891214 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:21:26.512720 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:21:19.278320 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:21:20.622290 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:21:22.060359 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:21:23.524353 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:21:25.126265 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:21:26.708879 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:21:19.518259 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:21:20.867899 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:21:22.302855 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:21:24.019889 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:21:25.370321 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:21:26.938231 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:21:19.709401 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:21:21.123178 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:21:22.547069 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:21:24.257202 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
2022-07-04T20:21:25.619248 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/

Correlations

2022-07-04T20:21:52.072377 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient ( ρ ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r . It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y , one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-07-04T20:21:52.611530 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient ( r ) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r .

To calculate r for two variables X and Y , one divides the covariance of X and Y by the product of their standard deviations.
2022-07-04T20:21:53.146962 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient ( τ ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y , one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-07-04T20:21:53.710689 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here .

Missing values

2022-07-04T20:21:27.452315 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
A simple visualization of nullity by column.
2022-07-04T20:21:30.636113 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-07-04T20:21:32.495139 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2022-07-04T20:21:34.962377 image/svg+xml Matplotlib v3.5.2, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.